Archive for the ‘Digital Preservation’ Category

The October, 2009 Issue of the Digital Preservation Newsletter is Now Online from the NDIIPP and Library of Congress

Friday, October 2nd, 2009

Access the Complete Issue (2 pages; PDF)

This Issue Includes:

+ News of 2009 Best Practices Exchange and the Preserving Digital News meeting

+ An article about a Digital Preservation Workshop held at the Library of Congress

+ The Netherlands Coalition for Digital Preservation sponsored a national conference and published an interim report

+ Government Computer News recognizes NDIIPP among the best of Federal information technology initiatives of 2009

+ New guidelines for content categories and digitization objectives published by the Federal Agencies Digitization Guidelines Initiative

+ An interview podcast about the DuraSpace pilot project is available from Federal News Radio

Source; National Digital Information Infrastructure and Preservation Program (NDIIPP) / Library of Congress

Cloud Computing and Digital Preservation on Federal News Radio

Wednesday, September 23rd, 2009

From the Text:

The Library of Congress has a mission that is very similar to several Federal agencies…they are preserving huge amounts of records. And like Federal agencies, they are looking at new technologies to meet that mission. One way they’re doing that is through a pilot project with DuraSpace, that will store some records in the cloud. Bill LeFurgy is the Digital initiative project coordinator at the Library of Congress, and he told me how the pilot project will work.

Listen Online or Download (mp3) the Audio of the Interview. It runs about 14 minutes.

Source: Federal News Radio

Digitization: Chronicling America Illustrated Newspaper Pages from 1906 Added to LC Flickr Photostream and Other Chronicling America Links

Saturday, September 12th, 2009

From the Announcement:

The Library of Congress has added another year’s worth of historic illustrated newspaper pages to the LC Flickr photostream. The New-York Tribune Illustrated Supplement section of 1906, printed on Sundays, includes published images of signature events of 1906, including: construction of the Panama Canal, 3 weeks of coverage on the San Francisco Earthquake, the Chicago meat packing industry, storm devastation in Hong Kong and Alabama and more….In Flickr, you can tag it, add a note, share it….and even read more about it!

Access the Library of Congress Flickr Stream

Access the Chronicling America Database and Directory

See Also: Milestones: Library of Congress, National Endowment for the Humanities Celebrate Millionth Page in Chronicling America Program

See Also: Now Available: Webcast: One Millionth Page in Chronicling America

See Also: New from the Library of Congress: Chronicling America Topic Guides

See Also: Library of Congress Flickr Stream Adds European Images

Source: LC

Now Online: September 2009 Issue of the Library of Congress Digital Preservation Newsletter

Friday, September 11th, 2009

Access Full Issue

Articles Include:

+ Profile of Digital Preservation Pioneer David Riecks

+ An article about recently published white papers on preserving digital legislative data

+ LOCKSS Chief Scientist David Rosenthal speaks at Library of Congress

+ An article about the K-12 Web Archiving Program

+ Library of Congress digital initiatives profiled in Library Journal

+ News of the 2009 SAA annual meeting and Saving Public Policy Web Content meeting

+ Upcoming Events: iPres 2009 and the Cultural Heritage Online Conference

Source: National Digital Information Infrastructure and Preservation Program at the Library of Congress

Another New Web Archiving Service: WAX from Harvard University

Tuesday, September 8th, 2009

A few weeks ago we posted about the new California Digital Library Public Web Archive Service Collections.

Today, via DigitalKoans we learn of another web archiving service named WAX at Harvard University.

From the Web Site:

The public interface for Harvard’s new Web Archive Collection Service (WAX) launched on February 4, 2009. WAX began as a pilot project in July 2006, funded by the University’s Library Digital Initiative (LDI) to address the management of web sites by collection managers for long-term archiving. It was the first LDI project specifically oriented toward preserving “born-digital” material. WAX has now transitioned to a production system supported by the University Library’s central infrastructure.

Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.

WAX was developed as an initial–and only partial–response to these and other concerns, which range from technical feasibility to legal and financial implications. The pilot focused on harvesting content from the surface web–content that is discoverable to search engines through web crawlers, as opposed to content hidden from web crawlers in a database or restricted by password or login protection.

Review the WAX Collections

Much More about WAX from DigitalKoans

Source: WAX, DigitalKoans

Note: Of course, don’t forget about The Wayback Machine from the Internet Archive (IA). It’s now home to over 150 billion archived web pages. The IA also does “custom” web archiving via their very cool Archive-It service.

Web Archiving: Administration Wants Help Archiving its Facebook, Twitter Content

Wednesday, September 2nd, 2009

And yet another role for the information professional.

From the Article:

The Executive Office of the President (EOP) plans to hire a company to help archive the ever-expanding amount of data that qualifies as presidential records that the office publishes on publicly accessible Web sites and social networking sites, according to a recently published solicitation notice.

The EOP wants a contractor to capture and store content posted on the sites that the administration is required to maintain under the Presidential Records Act (PRA), the notice said. According to the request for quote (RFQ) notice, the contractor will also be responsible for transferring the captured data to the National Archives and Records Administration (NARA) for historical preservation.

[Snip]

White House officials want the company to crawl and archive PRA content on third-party Web sites where the EOP maintains a presence, such as the White House’s Facebook page and Twitter feed, the notice said. According to the RFQ, the EOP wants data capture to be automatic rather than how it’s currently done.

White House officials want the company to crawl and archive PRA content on third-party Web sites where the EOP maintains a presence, such as the White House’s Facebook page and Twitter feed, the notice said. According to the RFQ, the EOP wants data capture to be automatic rather than how it’s currently done.

EOP officials want to capture the posted content at least twice a day, the notice said. In addition, they said the vendor will have to make the data organized and searchable and provide a Web-based tool that government employees can use to manage the record-keeping.

The data will also need to be stored in a way that will let NARA ingest the records into the agency’s next-generation Electronic Records Archives system.

Access the Complete Solicitation (PDF)

Source: FCW

English Language: A Future for Our Digital Memory: Permanent Access to Information in the Netherlands

Wednesday, September 2nd, 2009

From the Announcement:

A future for our digital memory: permanent access to information in the Netherlands, English-language summary, twenty-page English-language summary of the report of the Dutch National Digital Preservation Survey; Dutch report 1 July 2009.

In order to underpin its strategy, the NCDD decided build a detailed picture of the current situation in the public sector in the Netherlands. Can institutions or domains be identified which have successfully risen to the challenge of digital preservation and permanent access? What categories of data are in danger of being lost? How can the risks be managed? The so-called National Digital Preservation Survey was funded by the Ministry of Ministry of Education, Culture and Science, and was held in the first six months of 2009.

A team of three researchers conducted some seventy interviews with stakeholders in three distinct sectors: government & archives, the research community, and cultural heritage institutions.

Access the Complete English Language Report: A future for our digital memory: permanent access to information in the Netherlands, English-language summary

Source: Netherlands Coalition for Digital Preservation

A Data Deluge Swamps Science Historians

Friday, August 28th, 2009

From the Article:

In a vault beneath the British Library here, Jeremy Leighton John grapples with a formidable challenge in digital life. Dr. John, the library’s first curator of eManuscripts, is working on ways to archive the deluge of computer data swamping scientists so that future generations can authenticate today’s discoveries and better understand the people who made them.

His task is only getting harder. Scientists who collaborate via email, Google, YouTube, Flickr and Facebook are leaving fewer paper trails, while the information technologies that do document their accomplishments can be incomprehensible to other researchers and historians trying to read them. Computer-intensive experiments and the software used to analyze their output generate millions of gigabytes of data that are stored or retrieved by electronic systems that quickly become obsolete.

Source: Wall Street Journal
Hat Tip: ACM Tech News

Experts Discuss Saving Public Policy Web Content

Thursday, August 27th, 2009

From the Post:

Curators and public policy experts representing commercial, academic and non-profit organizations convened for a two-day meeting at the Library of Congress to explore strategies for preserving public policy content that has been made available only on the web.

As more and more of existing public policy content is only available on the web, the challenge of providing enduring access to, and long-term preservation of, public policy information is increasingly complicated. The Library’s National Digital Information Infrastructure and Preservation Program is exploring ideas about how to work with others to preserve this information.

NDIIPP is interested in this area as part of its work to catalyze development of a national collection of digital content though a national network of preservation partners. To date, the Program has engaged over 130 partners from the public and private sectors to work together to develop approaches and solutions for saving America’s digital heritage.

Source: NDIPP / Library of Congress

Portico Announces Digital Preservation Agreement with University of Technology Sydney Library

Tuesday, August 25th, 2009

From the Announcement:

Portico (www.portico.org) is pleased to announce the signing of an agreement with University of Technology Sydney (Australia) Library (UTS Library) to preserve its online collection of 11 e-journals. UTS Library created UTSeScholarship in 2004, an initiative that provides a secure, stable, digital home for the scholarly output of the University’s staff, students, and colleagues with whom they collaborate. UTSeScholarship includes UTSePress, which publishes these scholarly journals as well as books and conference proceedings.

Through this agreement with Portico, UTS Library ensures that the online versions of its journals will be preserved and available for future scholars, researchers, and students. UTS Library has named Portico as a mechanism to fill post-cancellation access claims and has also agreed to make an annual financial contribution to Portico.

See Also: Portico Homepage with the Latest Portico Stats

Source: Portico

Internet Archive Requests Your Help in Preserving GeoCities Materials

Tuesday, August 25th, 2009

From a Blog Post:

Yahoo! announced that it will close the site on October 26, 2009, steering users towards their paid service instead. We have been archiving GeoCities sites for years in our crawls, but, as goes with the territory of being web archivists, we want to make sure to gather as many of the sites as possible before the looming end of an era, 10-26-2009. If you have a page with GeoCities or are a fan of a particular page, please use our special collections page to ensure its preservation. Additionally, please refer to the Archive Team’s efforts to save cultural information that may be lost with the site closing. Yahoo! is also offering valuable advice at their help center.

Source: Internet Archive

Old South Carolina Newspapers to be Digitized, Put Online

Monday, August 24th, 2009

From the Article:

A rich but dusty archive of South Carolina history — newspapers published throughout the state from 1860 to 1922 — will become Web accessible through the S.C. Digital Newspaper Project, an initiative of University Libraries.

The project, funded with a two-year $350,000 grant from the National Endowment for the Humanities (NEH), will scan, enhance, and deliver to the Library of Congress an estimated 100,000 pages of selected S.C. newspaper titles. The resulting online archive will reflect major artistic, literary, religious, ethnic, cultural, economic, and political events in South Carolina and surrounding states.

Source: University of South Carolina

Meet the Digital Preservation Pioneers: David Riecks

Friday, August 21st, 2009

From the Article:

When you take a digital photo, the camera immediately records information about the aperture setting, shutter speed, focal length, metering mode and more. Some cameras have a global positioning system that adds information about where the photo was taken.

But David Riecks (rhymes with “clicks”) – professional photographer, digital-image technologist and metadata evangelist – believes that a digital photo should contain more information about itself. And he is on a mission to spread the word about the current and long-term value of photo metadata.

[Snip]

He became intrigued by keyword search and he developed a home-grown approach to metadata, judiciously adding keywords and descriptive metadata to each digital image. If a client requested a photo of a cow in a pasture, Riecks could search his image database for the keywords “cow” and “pasture” and quickly locate any of his photos that contained those keywords. The system helped streamline his business.

See Also: See the Complete List of Digital Preservation Pioneers

Source: NDIIPP / Library of Congress

Webcast: Ensuring the Longetivity of Digital Documents

Thursday, August 20th, 2009

Yesterday, we posted: Digital Preservation: LOCKSS Chief Scientist David Rosenthal Speaks at Library of Congress. The post contained links to slides and several related documents but not the video of Rosenthal’s presentation.

Today, the video was made available on the LC web site. You can view it here. The video runs 76 minutes.

Source: Library of Congress

Conference Paper: Citizen-Created Content, Digital Equity and the Preservation of Community Memory

Wednesday, August 19th, 2009

The following paper by Penny Carnaby (National Library of New Zealand) and presented by Sue Sutherland (National Library of New Zealand) will be delivered at the upcoming World Library and Information Congress: 75th IFLA General Conference and Assembly in Milan, Italy.

From the Abstract:

While the complex issues concerning the protection and preservation of digital assets are better understood by the information professions, there is still much thinking required about the preservation and protection of the new wave of citizen-created content.

Traditionally information professionals in all types of memory institutions have clearly met the need for, and nature of, the preservation activities around formal and authoritative knowledge services and systems. However, informal, citizen-created knowledge activities are far less straightforward in terms of preservation. These activities arise and evolve as individual citizens develop as authors, content creators, thought leaders, filmmakers, blog diarists, etc. There is at present an extraordinary
unleashing of content creation by individual citizens.

This development challenges established organisational systems and professional practice in an unprecedented way. This paper outlines some of the issues involved in the preservation of digital assets in this new environment. It explores how all memory institutions including archives, galleries, museums and libraries in particular, can value and protect a country’s digital assets in both the formal and informal arena.

Access the Full Text (10 pages; PDF)

Source: International Federation of Library Associations

Digital Preservation: LOCKSS Chief Scientist David Rosenthal Speaks at Library of Congress

Tuesday, August 18th, 2009

From the Summary

When David Rosenthal talks, people listen. They may not always agree with the Chief Scientist of the LOCKSS program based at Stanford University, but they engage with what he has to say.

This was the case on July 27, 2009, when a large crowd gathered at the Library of Congress to hear Rosenthal’s presentation How Are We Ensuring the Longevity of Digital Documents? (pdf, 321 Kb). Rosenthal’s talk was a reprise of his widely-discussed plenary at the Spring 2009 Coalition for Networked Information Task Force meeting. In his introduction at the CNI meeting, CNI Executive Director Clifford Lynch told the audience that Rosenthal’s work had changed his thinking about digital preservation.

+ Rosenthal’s presentation at the Library was filmed and will soon be available as a webcast. Update: The video (76 minutes) is now available.

+ Ensuring the Longevity of Digital Documents? (2009): Presentation Slides (PDF):

Source: NDIIPP, Library of Congress

The “What’s New at the Internet Library” Blog

Thursday, August 13th, 2009

The other day we posted some updated statistics about the many collections from the Internet Archive.

We didn’t mention that a great way to learn about new features, cool collections, etc. is the “What’s New at the Internet Archive” blog posted here.

From the looks of it the blog has about two or three posts a month. The latest posting (posted earlier today) is a lot of fun. It looks at (and provides links to) drive-in movie advertising accessible via the IA.

Almost forgot, if you prefer RSS, no problem. The “What’s New at the Internet Library” blog has an RSS feed accessible here.

Want Internet Archive news? Take a look at their Twitter feed: http://twitter.com/internetarchive

What’s the average lifespan of a Web page?

Wednesday, August 12th, 2009

Marieke Guy does an impressive job pulling together several estimates and the underlying papers where they come from. It’s one challenging question and getting even more so each day in this time of Twitter and similar services.

Here are some of those numbers:

+ 44 days (the average length of a URL) from Brewster Kahle’s 1997, “Preserving the Internet.”

+ 75 days from Michael Day’s Report: Collecting and preserving the world wide web

+ 100 days from a November, 2003 Washington Post article: On the Web, Research Work Proves Ephemeral.

The Internet Archive now gives 44 -75 days as its ball park figure.


Access the full post and underlying documents
.

Source: JISC-PoWR

European Publishers Target Google

Wednesday, August 12th, 2009

From the Article:

Google is facing growing opposition in Europe to its landmark US legal settlement with book publishers and authors, raising a fresh challenge to an agreement that could help determine the future structure of the digital books business.

The article focuses on issues in France, Germany, and from a group of Nordic publishing associations. It also includes comments from Google and Peter Brantley from the Internet Archive.

Source: Financial Times

New Zealand Launches Digital Continuity Action Plan

Tuesday, August 11th, 2009

From the Web Site:

The Digital Continuity Action Plan is a world first initiative [our emphasis] which will prevent important public records being lost and ensure today’s information is available tomorrow. Today most public information is created digitally, but the continuity of that digital information over time has become a real concern. To address this concern the plan has been developed as an all of public sector programme to assist and support agencies overcome issues with storing, accessing, using and reusing the digital information they produce.

Read the Plan (HTML) ||| PDF

Source: Archives New Zealand