Archive for the ‘Digital Preservation’ Category

K-12 Web Archiving Program

Saturday, August 8th, 2009

From the Article:

Following a successful pilot program during the spring of 2008, the Library of Congress, Internet Archive and California Digital Library initiated a web archiving program that explored archiving websites from the perspective of students in elementary, middle and high schools. Two Library activities supported the pilot: the National Digital Information Infrastructure and Preservation Program and the Teaching with Primary Sources program.

The K-12 Web Archiving Program gives students the opportunity to think about history by selecting sources for ongoing research use. Teens and younger students select and capture web content using Internet Archive’s Archive-It service, creating “time capsules” of what is important to them to represent their current lives.

During the 2008-09 school year, students from ten different schools in nine states participated in the program. Over 1,700 websites and 233 million URLs, or objects, were collected during the year, totaling 11.7 terabytes of data. The Internet Archive noted that 96 percent of the websites selected by students have not been archived by any other Archive-It partner, and 24 percent of the websites are not in the Internet Archive’s general archive. Examples include websites for the Iowa Farm Bureau, Women’s Adventures in Science, and How to Make a Sock Monkey. In total, 68 web collections were created – including a Prom Guide and Historical Black College Search collection – and immediately accessible on the Archive-It website.

Students and teachers alike found the program eye-opening. Student comments included “choosing the websites was really fun because it let everyone be creative and really think about what teenagers enjoy today,” and “I had never thought of archiving websites, even though in this day and age we use them as much as and more than books.” Teacher Emily Patterson of George Washington High School in Charleston, West Virginia said, “I think it was certainly an enriching experience. I like that it allowed them to see and examine their lives and Internet content as history in the making.”

Source: National Digital Information Infrastructure and Preservation Program / Library of Congress

Web Archiving Service Preserves Data for the Future

Saturday, August 8th, 2009

Note: We first posted about this new web archiving service a few weeks ago. Here’s a bit more information.

From the Announcement:

Researchers and scholars now will be able to delve into archived Web sites captured by the California Digital Library’s Web Archiving Service (WAS). This new tool enables faculty, researchers and librarians to capture, curate and preserve Web sites, thus creating permanent archives available to researchers everywhere. The social history of our times is now being preserved in archives as rich and varied as the contentious 2003 California recall election, hundreds of California state Web archives, the Guantanamo Bay Detention Camp Web archive and the Middle East Political Sites archive. New archives continually are being built and published and will appear along with the current archives, available at webarchives.cdlib.org/.

The Web has revolutionized our access to information. Documents and publications that once were difficult to find now are readily available to anyone at any time. Popular reactions to historical events unfold via blogs and personal Web sites, and we have an unprecedented view into popular culture and the formation of public policy. “This is a tool that can track censorship in China, political regimes in Iran, and social commentary around the world,” states Laine Farley, California Digital Library’s executive director. “CDL and the UC libraries are leading the way in building collections for the 21st century.”

Ready access to these publications cannot be taken for granted. Web pages and documents are as easy to change or remove as they are to publish. When sites are redesigned, when new administrations take office, when policies or organizations change, we witness the wholesale disappearance of information. State and local Web publications particularly are at risk. In many cases, these documents no longer are available in print, and libraries are challenged to continue their historic role as cultural memory institutions in the digital environment.

Source: University of California

Pilot Program: Preservation in the Cloud

Friday, July 24th, 2009

From the Report:

The Library of Congress National Digital Information Infrastructure and Preservation Program has launched a pilot program to test cloud technologies for preserving digital content. The pilot will focus on using a new service, DuraCloud, developed and hosted by the DuraSpace Foundation.

Source: LC / NDIIPP

The British Library Publishes Annual Report (2008-2009)

Wednesday, July 22nd, 2009

Access the Complete Report (Flash Version) or Text Only Version

From the Chief Executives Page:

There is enormous public and educational interest in the digitisation of our historic newspapers and we end the year with some three million digitised and fully searchable pages available online. We are now poised to work with a commercial partner to significantly scale up this effort over the coming years. This year we have also completed the digitisation of around 70,000 books and 12,000 recordings.

[Snip]

We have taken significant steps forward in services to support the scientific research community this year. TalkScience events have proved popular: our science collections are supporting a wealth of research – from interpreting our recordings of frog calls and investigating volcanic activity in the 18th century to using our contemporary and rare biomedical journals to inform a study on lactose intolerance. UK PubMed Central is rapidly growing as an open-access database service, with new facilities being added regularly.

Source: BL

California Digital Library Public Web Archive Service Collections Launched

Wednesday, July 22nd, 2009

From the Article:

The California Digital Library has opened its Web Archiving Service collections. Topics in the collection range from California government agencies to middle-eastern politics to natural disasters. The institutions that harvested and curated the websites include New York University, the University of North Texas, Stanford University and several University of California campuses.

Users can browse the public archives by URL or search by keyword. Users can also view changes over time for a given web site if that website was harvested more than once. This feature is especially useful when comparing something like the daily reporting on the 2007 Southern California wildfires or a quick check to see what new documents were added to a site from crawl to crawl.

Direct to California Digital Library Web Archiving Service

Source: National Digital Information Infrastructure and Preservation Program at the Library of Congress

Library of Congress test drives cloud storage

Wednesday, July 15th, 2009

From the Article:

The Library of Congress National Digital Information Infrastructure and Preservation Program and DuraSpace have announced that they will launch a one-year pilot program to test the use of cloud technologies to enable perpetual access to digital content.

The pilot will focus on a new cloud-based service called DuraCloud, that replicates and distributes content across multiple cloud providers and enables organizations to share, access, and preserve said content. Eventually the service will also provide computing capabilities in addition to the storage and archiving functions. (DuraSpace is a joint effort of the Fedora Commons and the DSpace Foundation.)

Source: News.com

See Also: Official News Release from LC

Sustainable Strategies for Digital Resources

Tuesday, July 14th, 2009

From the Announcement:

Spending on digital resources is under the spotlight in an international study which aims to help the not-for-profit sector develop cost-effective strategies for financing technology.

As institutional budgets tighten, will these digital resources be able to survive and thrive?

The research, released today (15 July 2009) by the JISC-led Strategic Content Alliance and Ithaka S+R, looks at twelve different projects across the globe and how they are successfully identifying sources of support and generating revenue.

Access the Full Text (40 pages; PDF)

Source: Strategic Content Alliance & Ithaka (via JISC)

New Online: Library of Congress Digital Preservation July 2009 Newsletter

Tuesday, July 14th, 2009

Direct to Newsletter (2 pages; PDF)

Articles Include:

+ A report on the latest NDIIPP meeting

+ Digital Preservation Pioneer Jerry Handfield of the Washington State Archives

+ The first video in the NDIIPP Digital Preservation Video Series

+ News of the latest meetings and upcoming events

Source: The Library of Congress / National Digital Information Infrastructure

Speech: “How Are We Ensuring the Longevity of Digital Documents?”

Monday, July 13th, 2009

This plenary presentation by David Rosenthal was given at the CNI (Council for Networked Information) Spring Task Force Meeting. It was recently made available online. You can view the video here.

Source: CNI

NDIIPP Launches New Digital Preservation Video Series

Monday, July 6th, 2009

From the Announcement:

The Library of Congress National Digital Information Infrastructure and Preservation Program has released a new video: Bagit: Transferring Content for Digital Preservation.

Just over three minutes long, the video is aimed at librarians, archivists, and others interested in working with digital content.

The Bagit production is the first in a planned series of videos that will address specific digital preservation issues. Currently, the Library has a number of online video presentations featuring NDIIPP partners discussing their projects.

View Video/Access Transcript

Collection of other NDIIP Videos

Source: National Digital Information Infrastructure and Preservation Program

New Article: Mining Contextual Information for Ephemeral Digital Video Preservation

Tuesday, June 30th, 2009

From the Abstract

For centuries the archival community has understood and practiced the art of adding contextual information while preserving an artifact. The question now is how these practices can be transferred to the digital domain. With the growing expansion of production and consumption of digital objects (documents, audio, video, etc.) it has become essential to identify and study issues related to their representation. A cura­tor in the digital realm may be said to have the same responsibilities as one in a traditional archival domain. However, with the mass production and spread of digital objects, it may be difficult to do all the work manually. In the present article this problem is considered in the area of digital video preservation. We show how this problem can be formulated and propose a framework for capturing contextual infor­mation for ephemeral digital video preservation. This proposal is realized in a system called ContextMiner, which allows us to cater to a digital curator’s needs with its four components: digital video curation, collection visualization, browsing interfaces, and video harvesting and monitoring. While the issues and systems described here are geared toward digital videos, they can easily be applied to other kinds of digital objects.

Direct to Complete Article (18 pages; PDF)

Source: The International Journal of Digital Curation

The Summer, 2009 Issue of Muse News (Project Muse) is Now Available

Tuesday, June 30th, 2009

Direct to Issue (4 pages; PDF)

Articles Include:

+ Project MUSE announces new titles and prices for 2010

+ MUSE and Social Technology

+ New Features and Functionality Enhance MUSE Experience

+ Using MUSE to Your Advantage: More By an Author

See Also: Project Muse Facebook Page

Source: Project Muse

Meeting the Challenge: Digital Content Transfer Tools

Monday, June 29th, 2009

From a Post:

The Library of Congress has developed new tools to transfer large quantities of digital content. During 2008, the Library used these tools to add approximately 80 terabytes to its digital collections.

As described in the Library of Congress’s video, Bagit: Transferring Content for Digital Preservation, the sender of a digital collection prepares for the transfer by packaging the collection and making it accessible for the Library to download. The Library prefers data packaged into standardized “bags,” a means of organizing and containing data for transfer as described in the BagIt specification.

Direct to Video

See Also: Read more about the Library’s bag-related data transfer tools.

Source: National Digital Information Infrastructure and Preservation Program / Library of Congress

Portico Announces Digital Preservation Agreement with Emerald

Friday, June 26th, 2009

From the Announcement:

Portico (www.portico.org) is pleased to announce the signing of an agreement with Emerald Group Publishing Limited to preserve its entire online journals collection. Established in 1967, Emerald Group Publishing Limited is the world’s leading publisher of management research. In total, Emerald publishes over 700 titles, comprising 200 journals, over 300 books and more than 200 book series as well as an extensive range of online products and services.

Source: Portico

UK: A New Project to Learn What it Takes to Archive Blog Content

Wednesday, June 24th, 2009

From a Web Site:

ArchivePress is a blog-archiving project being undertaken by the University of London Computer Centre and the British Library Digital Preservation department, funded by the JISC Information Environment Programme under its Rapid Innovation Grants Call (03/09).

The project will explore practical issues around the archiving of weblog content, focusing on blogs as records of institutional activity and corporate memory. As an alternative to the web crawling/harvesting approach of the Internet Archive and the UK Web Archive, ArchivePress will test the viability of using RSS feeds and blog APIs to harvest blog content (including comments, embedded content and metadata). The archived content will be stored and managed using instances of Wordpress, thereby maintaining the blogs’ native data structures, formats and relationships. We hope to develop tools and methodology that will enable organisations to use simple, free, open source blogging software to manage a central archive of designated institutional blog outputs, even if they are spread over different blog hosts and platforms. The benefits of this approach will include:

+ targeted gathering of selected weblogs

+ improved reliability and authenticity of records

+ citable blog content with persistent identifiers

+ automated, ongoing harvesting, via newfeeds

+ accessibility of content, using native blog interfaces

+ use of native web and database file formats, compatible with registry-based preservation activities.

Direct to ArchivePress Web Site

Source: ArchivePress
Hat Tip: The JISC-PoWR Blog (Excellent Overivew of Project

New Digital Preservation Briefing Documents

Tuesday, June 23rd, 2009

From a Blog Post:

We have recently published a number of new briefing documents on digital preservation.The new documents cover Introduction to Web Resource Preservation, Preserving Web 2.0 Resources, Preserving Your Home Page, Selection for Web Resource Preservation and Web Archiving.

Source: Cultural Heritage (Blog) from UKOLN

Preservation: Which Images From 2009 Will Future Generations Want to Revisit?

Tuesday, June 23rd, 2009

From a Guardian Article:

For Robin Baker, the head curator of the BFI archive [British Film Institute] archive, more important still are the grandchildren yet to come. His stock of thousands of miles of film and documents on television and film stretches, as you would expect, far into the past, but it also reaches for the future. Each week the archive , which is housed around a group of old farm buildings in Berkhamsted, Hertfordshire, selects a range of current images from feature films, commercial network television and the visual arts to preserve for the ages.

The team that selects television programmes for the archive tries to represent the general output, but their curatorial concerns centre on whether to pick out those episodes of, say, Britain’s Got Talent that had the most impact at time of broadcast or those that were most typical of the talent show genre in 2009.

Source: The Guardian (via AMIA)

See Also: On a Somewhat Related Note:
HBO Archives Opens Up March of Time Vault

Library of Congress Capturing Web Content During Supreme Court Nomintation Process

Monday, June 22nd, 2009

The other day we posted about the Library of Congress will be capturing some Twitter “tweets” during the Supreme Court confirmation process of Judge Sonia Sotomayor.

Today, while browsing the Library of Congress web site we came across a larger project to capture and archive web content dealing with the confirmation. In other words, the “tweets” are part of a larger LC initiative.

From a LC Web Capture site web page:

The Supreme Court Nominations 2009 Web Archive will be a selective collection of Web sites archived between June 2009 through the completion of the hearings process. Web sites collected will include materials produced by watchdog, public policy, and political advocacy groups, blogs and tweets, community and religious organizations, foreign and domestic news sources, educational and research institutions, and independent websites.

Collection dates: June 2009 through confirmation hearings.

Source LC

American Archivist in JSTOR

Thursday, June 18th, 2009

From the SAA Web Site:

SAA signed an agreement in April to have American Archivist participate in JSTOR, an independent not-for-profit organization that is dedicated to making a wide range of intellectual content available in a trusted digital archive. Currently the JSTOR archive includes the complete back runs of more than 800 journals, which are available to libraries. American Archivist would be part of the newly developing Arts and Sciences VII collection under “Library and Information Sciences.” The entire run of the journal is projected to debut in 2010. The recently retired Charles Schultz has generously donated his back issues of the journal (1963 through 2008) to SAA for use by JSTOR. Issues prior to 1963 will dovetail with the OCLC digitization project.

Source: Society of American Archivists

See Also: Learn More About the OCLC Project Mentioned in the Post