Archive for the ‘Digital Preservation’ Category

Digital Preservation: ACM Will Partner with Portico and CLOCKSS for Preservation of Its Digital Library Resources

Friday, November 6th, 2009

From an Announcement:

ACM (the Association for Computing Machinery) announced today that it is providing its institutional library customers with advanced electronic archiving services to preserve their valuable electronic resources. These services, provided by Portico and CLOCKSS, address the scholarly community’s critical need for long-term solutions that assure reliable, secure, deliverable access to their burgeoning digital collection of scholarly works. ACM is offering these services to protect the vast online collection of resources in its Digital Library (DL), which are used by over 1 million computing professionals and students worldwide.

“By partnering with Portico and CLOCKSS, we are able to meet a growing demand in the library community for a trusted, reliable third-party archive, and to ensure that digital collections remain accessible to future scholars, researchers, and students,” said Scott Delman, ACM Group Publisher. “Scientific discovery and the educational process are not possible without reliable access to the accumulated scholarship of the past and secure preservation of the scholarly record, and these agreements are a clear step forward with the relationship between the ACM and the library community.”

By investing in long-term digital preservation of content, ACM’s aim is to make it easier for libraries to accelerate their transition away from print and free up resources invested in print collections in favor of new and innovative electronic products and services.

Much More After a Click
(more…)

New Video on Web Archiving

Friday, November 6th, 2009

From the Description:

Web content changes all the time. If we don’t save that content before it disappears, a major part of our cultural history will be lost.

The Library of Congress is working to provide permanent access to web content of historical importance. It selects websites for collection, requests permissions from the website owners, addresses the technology of collecting websites and preserves the websites and makes them available.

This video examines those four challenges.

Access the Video (embedded here)

A text transcript is also available (PDF)

Source: National Digital Information Infrastructure and Preservation Program

Bibliotheca Alexandrina: A Digital Revival

Tuesday, November 3rd, 2009

The Bibliotheca Alexandrina is one busy place. If you want to learn more read on through our highlights but make sure to read the complete article. Our highlights is just a sample of what’s going on.

From the Article:

The International School of Information Science (ISIS) a research institute affiliated with the BA [Bibliotheca Alexandrina], aims at furthering the BA’s goals of being a leading institution in knowledge dissemination and, specifically, promoting research and development related to the digital libraries. Toward that goal, ISIS has embarked on an array of ambitious projects, in partnership with world-class institutions. These include hosting a mirror site for the Internet Archive, participating in the Million Book Project, organizing the digital archive of the Gamal Abdel Nasser collection, digitizing 113 years of Al-Hilal magazine, presenting the first-ever complete digital version of Description de l’Egypte, conducting advanced research such as the Arabic component of the UN-sponsored Universal Networking Language computerized multi-language translation program, and offering the most advanced 3D virtual imaging techniques in a virtual immersive environment for science and technology applications. Thus, despite being barely seven years in existence, the BA already has a substantial record of achievements.

Among the other projects you’ll read about are:

+ The Digital Assets Repository (DAR)

+ Memory of Modern Egypt Digital Repository

+ Archive documenting the history of the Suez Canal

+ SuperCourse

To empower science educators worldwide, the BA is working with a team of specialists, in partnership with the University of Pittsburgh, to launch the first science SuperCourse, comprising thousands of PowerPoint lectures made available for free to teachers and lecturers, who can use the lectures as they see fit in their teaching of science. The SuperCourse has been effectively implemented in the area of Public Health and Epidemiology, with a network of 65,000 scientists in 174 countries, providing more than 3,500 lectures in 31 languages. The BA maintains a mirror site of SuperCourse, which receives an average of one million hits per month, and is working on setting up a similar course in all fields of science.

Much More in the Complete Article

Source: EDUCAUSE Review
Hat Tip: OAN

GeoCities Says So Long as Internet Archive Works to Preseve Content

Tuesday, October 27th, 2009

In August, we first posted about the Internet Archive (IA) asking GeoCities users to make sure their content was archived by the IA. Why? As of yesterday, GeoCities is no longer online.

From the Article:

Yahoo, which acquired the site for $3.57bn (£2.17bn) in 1999 at the height of the dotcom boom, said sites would no longer be accessible from 26th October.

However, many of the pages have been archived and will still be available to view via the nonprofit Internet Archive project.

The giant digital library, which has been archiving the public web since 1996, has set up a special project to archive GeoCities before it is lost forever.

“We’ve collected a lot of GeoCities sites over the years – but might not have every site and every page,” the Internet Archive said.

Access the Complete Article

Source: BBC

See Also: Saving a Historical Record of GeoCities (via Internet Archive)

Library of Congress’ National Digital Information Infrastructure and Preservation Program Wins Government Computing News Award

Saturday, October 24th, 2009

The NDIIPP as one of 11 projects to receive GCN [Government Computing News] Award for Agency IT Achievement.

From the Summary:

It took two centuries for the Library of Congress to acquire its 29 million books and 105 million other items. Today, it only takes 15 minutes for the world to produce an equal amount of information in digital form, creating unprecedented archiving challenges for the Library of Congress. The Library is meeting the challenge of digital preservation by developing new tools to transfer large quantities of digital content. To date, more than 3 million files have been transferred and stored using the BagIt specification. Due to the Library’s digital preservation initiatives, more than 1,000 collections of digital content have been selected, captured, preserved, and made available to the U.S. public and online visitors across the globe.

Access the Complete Article

We are warned to be careful about what we put online because data on the Internet lives forever. But keeping random copies of files on servers, routers and databases is not the same as preservation, said Martha Anderson, director of program management for the Library of Congress’ National Digital Information Infrastructure and Preservation Program. Digital data can be ephemeral. “That is the paradox,” she said.

Much More in the Summary and Complete Article

Source: GCN

See Also: Library of Congress News Release

Getting to Know the HathiTrust Digital Library

Friday, October 23rd, 2009

Barbara Quint Writes:

With all the controversy still swirling around Google Books and its post-settlement offerings, an alternative route to the millions of digitized books and journals supplied by leading Google Book Search library partners has arrived. The HathiTrust (www.hathitrust.org) is a collaboration of 25 research libraries already participating in Google Book Search to produce a shared digital repository for preservation and access to a curated collection. By mid-November, the HathiTrust Digital Library will have a full-featured, full-text search service for 4.3-5 million items. The searches will retrieve bibliographic citations and page references, including those for in-copyright books. Content will extend beyond the digitized copies of books returned to early library partners by Google. HathiTrust is pushing to acquire other digitized special collections from its members, as well as making arrangements for opening access to university press books.

[Snip]

The new launch will open indexing to nearly 1.5 billion pages from well more than 4.3 million volumes with full-text searching by keyword or phrase. (Just between us, if you simply cannot wait until mid-November, go to

http://babel.hathitrust.org/cgi/ls.

[John] Wilkin, [associate university librarian at the University of Michigan and executive director of the HathiTrust], tipped me off that, [our emphasis] although this “experimental search” site claims to search only 500,000 documents, it actually includes the full 4.3-5 million volumes. Feedback options appear at the top and bottom of each search results page.) The system already had the equivalent of library cataloging searching, though they expect to upgrade even that kind of searching under a cooperative program with OCLC.

Much More in the Complete Article

Source: InfoToday NewsBreaks

Article: Missing Links: The Enduring Web

Thursday, October 22nd, 2009

From the Abstract:

The Web runs at risk. Our generation has witnessed a revolution in human communications on a trajectory similar to that of the origins of the written word and language itself. Early Web pages have an historical importance comparable with prehistoric cave paintings or proto-historic pressed clay ciphers. They are just as fragile. The ease of creation, editing and revising gives content a flexible immediacy: ensuring that sources are up to date and, with appropriate concern for interoperability, content can be folded seamlessly into any number of presentation layers. How can we carve a legacy from such complexity and volatility?

Access the Complete Article (PDF)

Source: International Journal of Digital Curation (4.2)

Preserving Internet Content

Tuesday, October 13th, 2009

From the Web Site:

On October 7, 2009, the IIPC [International Internet Preservation Consortium] sponsored a free, one-day event, Active Solutions for Preserving Internet Content, following iPRES 2009, the 6th International Conference on Preservation of Digital Objects, held at the Mission Bay Conference Center, San Francisco. Slide presentations are available on the conference program page.

Presentations with Slides Include:

+ Billions and billions of objects, METS, PREMIS, oh my! (Gina Jones)

+ Preserving Access-Making more informed guesses about what works (David Pearson)

+ “Here be dragons” – Strategies for dealing with viruses in the web archive (Matt Holden)

+ Say Emulate; He Says Migrate (David Pearson)

+ Keep Websites Alive (Jeffrey van der Hoeven)

+ What do web archivers (or is it archivists) really do? (Gina Jones)

+ Web Archives Are Forever: defining a workflow for long term preservation of web archives (Maureen Pennock)

+ Square pegs? Fitting web archives into the digital preservation repository of the National Library of New Zealand (Kevin De Vorsey)

+ Continuity and Preservation: The National Archives approach to maintaining permanent access to the web presence of UK Central Government
(Amanda Spencer and Alison Heatherington)

+ It’s the end of a project, as we know it: a leading discussion on experiences and issues in embedding web archiving and preservation in an organization (Marcel Ras and Hilde van Wijngaarden)

Source: netpreserve

NDIIPP Releases Web Archiving Video

Friday, October 9th, 2009

From the Story:

Web content changes all the time. If we don’t save that content before it disappears, a major part of our cultural history will be lost.

This is the message of the second video in the Library of Congress National Digital Information Infrastructure and Preservation Program’s video series. The just-released video, “Web Archiving,” discusses the Library’s approach to collecting and preserving content found on the World Wide Web.

The three-minute video is targeted to librarians, archivists, and others interested in working with digital content.

[Snip]

The “Web Archiving” production is the second in the series, following the Bagit video that was released in July 2009. The Bagit video describes a specification for securely transferring digital content.

View the Web Archiving Video

Video Presentations Homepage

Source: National Digital Information Infrastructure and Preservation Program, Library of Congress

A Look at the Major League Baseball Video Library Film Archive

Thursday, October 8th, 2009

If you’re a baseball fan, this is a “must read.”

From the Article:

No American sport has a past as deep and cherished as baseball’s. But precious little of the sport’s history is preserved in moving images. Much occurred before the television age, leaving only grainy, scattershot clips culled from newsreels and home movies — and rarely does it show a player of [Babe] Ruth’s stature.

The newly arrived Ruth film is part of the video collection of Major League Baseball Productions, the league’s official archivist, which spans more than 100 years and includes about 150,000 hours of moving images. Most of the collection is stored in plastic cases that line metal shelves of a room labeled “Major League Baseball Film and Video Archive.” The overflow rests in storage a few miles away, in Fort Lee, N.J.

The article goes on describe how Frank Caputo, manager of the MLB Network video library film archive and Joe Porciello research a newly discovered 8-millimeter clip (it was found by a New Hanpshire man in his grandfathers home movie collection).

Source: The New York Times

See Also: Just in Time for the Major League Playoffs and World Series: Baseball Resources at the Library of Congress Web Guide

On Google and Usenet

Wednesday, October 7th, 2009

The article begins with one paragraph about Google Book Search but the story actually focuses on the Usenet archive (Google Groups).

From the Article by Kevin Poulsen:

…a few geeks with long memories remember the last time Google assembled a giant library that promised to rescue orphaned content for future generations. And the tattered remnants of that online archive are a cautionary tale in what happens when Google simply loses interest.

That library is Usenet, a vast internet- and dial-up-based message board system erected in 1980. Though moribund today, for decades Usenet was the paper of record for the online world, and its hundreds of millions of “newsgroup” postings chronicle everything from the birth of the web to the rise of Microsoft, as well as more trivial matters.

In February 2001, Google rescued that history when it acquired the New York-based Deja.com, and with it a Usenet archive going back to 1995. It turned the archive into Google Groups, in a move that was cheered by net geeks who had seen Deja’s reliability declining, and were certain that the supremely competent Google would save it.

[Snip]

Flash forward nearly eight years, and visiting Google Groups is like touring ancient ruins.

[Snip]

Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.

[Snip]

“The search results are extremely poor,” says network pioneer Brad Templeton. “Like nobody cares.”

Henry Spencer, whose Usenet archive forms much of Google Groups, is troubled by the company’s curatorship. “Google does get a lot of credit for putting it together and making it available,” Spencer says. “But search capabilities are important for such a large collection of data. The archive’s value to the community is considerably reduced if it’s not conveniently searchable.”

Source: Wired

Legal Delays Have Blown a Hole in UK’s Digital Heritage

Monday, October 5th, 2009

From the Article:

Digital literature, online scientific research and internet journalism that should have been saved in the nation’s main libraries over the past five years may have been lost because ministers have failed to give them the legal power to copy and archive websites, the Guardian has learned.

Lost digital archive: ‘It’s taken 6 years to begin consultation’ Link to this audio Senior executives at the British Library and the National Library of Scotland (NLS) are dismayed that legislation giving them the right to collect online and digital material is still not in force, more than six years after it was passed by parliament.

The omission has meant the libraries – which are legally required to archive books, newspapers and journals – have failed to record online coverage of major events such as the Iraq and Afghanistan wars, the release of the Lockerbie bomber and the MPs’ expenses scandal.

[Snip]

Phil Spence, head of operations at the British Library, said the failure had left a major “digital black hole” in the library’s collections, with huge gaps in the archives for researchers, scientists and historians.

It meant the British Library was unable to store the BBC’s website, the National Gallery or British Museum website, any UK newspapers’ websites, or scientific journals published online because of copyright issues. Blogs, community pages, government and business websites can only be archived after laborious voluntary agreements. The act would protect the libraries against copying defamatory material, but would also protect a publisher’s copyright.

“We’ve lost five years of digital content which is gone potentially for ever, and the ability of the nation to capitalise on that as well,” he said.

Much More in the Full Text Article Including a 3.5 Minute Audio Report

Source: The Guardian

Report From Digital Preservation Workshop Held in DC

Monday, October 5th, 2009

From the Report:

Over twenty Library of Congress staff had an opportunity to participate in a special workshop, Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems, hosted by the Inter-university Consortium for Political and Social Research, held September 21-22, 2009 in Washington, DC.

Initially developed at the Cornell University Library and supported with funding from the National Endowment for the Humanities, the Digital Preservation Management workshops are structured curricula geared toward managing digital preservation planning and policies for libraries, archives, and other cultural heritage institutions. The goal of the workshop is to provide those managers and staff responsible for digital assets the practical means to exercise stewardship in an age of technological change. Many institutions struggle with the initial stages of developing digital preservation policies, and the workshop aides participants in understanding the fundamental pieces of how to think about and enact planning for organizations.

[Snip]

The next five-day workshops will be held October 11-16, 2009 at the University of Michigan – where Martha Anderson, director of program management for the National Digital Information Infrastructure and Preservation Program, will be the keynote speaker – and June 13-18, 2010 at MIT in Cambridge, Massachusetts. For more information about the workshops, please visit: www.icpsr.umich.edu/dpm/workshops/fiveday.html.

Source: National Digital Information Infrastructure and Preservation Program / Library of Congress

NDIIPP Conducts Two Day Workshop on Preserving Digital News

Friday, October 2nd, 2009

From the Post:

The Internet has impacted news and journalism more than almost any other category of information. Newspapers have always been important research resources for users of libraries, archives and historical societies. But significant events are now reported in new ways, such as through blogs, podcasts, social-networking services, online news aggregators and multimedia web content. To address this change, the National Digital Information Infrastructure and Preservation Program convened a two-day workshop to discuss a national strategy for collecting and preserving news content that is disseminated only in digital form.

The meeting on September 2-3, 2009, brought together over fifty invited specialists in the field: creators, distributors, archivists, and researchers who depend upon historical news. The topics for discussion included the following:

+ What is digital news? Who produces it? What forms does it take?
+ What is important to preserve for the nation?
+ What collaborative efforts for preservation are succeeding now?
+ What are the roles for content owners and public archives in preserving digital news?
+ What roles do “local” and “national” content and organizations serve?
+ What are some strategies and possible models for addressing the issues in a distributed way?

A number of lively conversations among the diverse participants prompted several innovative solutions, including blogs that self-archive and newspapers that opt-in to public institution web archiving. Case studies and analyses of how historical news is consumed and used, especially with regard to dynamic and multi-media content, were suggested. Local news blogs were deemed an important area to monitor as they seem particularly at-risk and ripe for a distributed solution.

A Bit More in the Complete Article

Source: National Digital Information Infrastructure and Preservation Program (NDIIPP) / Library of Congress

The October, 2009 Issue of the Digital Preservation Newsletter is Now Online from the NDIIPP and Library of Congress

Friday, October 2nd, 2009

Access the Complete Issue (2 pages; PDF)

This Issue Includes:

+ News of 2009 Best Practices Exchange and the Preserving Digital News meeting

+ An article about a Digital Preservation Workshop held at the Library of Congress

+ The Netherlands Coalition for Digital Preservation sponsored a national conference and published an interim report

+ Government Computer News recognizes NDIIPP among the best of Federal information technology initiatives of 2009

+ New guidelines for content categories and digitization objectives published by the Federal Agencies Digitization Guidelines Initiative

+ An interview podcast about the DuraSpace pilot project is available from Federal News Radio

Source; National Digital Information Infrastructure and Preservation Program (NDIIPP) / Library of Congress

Cloud Computing and Digital Preservation on Federal News Radio

Wednesday, September 23rd, 2009

From the Text:

The Library of Congress has a mission that is very similar to several Federal agencies…they are preserving huge amounts of records. And like Federal agencies, they are looking at new technologies to meet that mission. One way they’re doing that is through a pilot project with DuraSpace, that will store some records in the cloud. Bill LeFurgy is the Digital initiative project coordinator at the Library of Congress, and he told me how the pilot project will work.

Listen Online or Download (mp3) the Audio of the Interview. It runs about 14 minutes.

Source: Federal News Radio

Digitization: Chronicling America Illustrated Newspaper Pages from 1906 Added to LC Flickr Photostream and Other Chronicling America Links

Saturday, September 12th, 2009

From the Announcement:

The Library of Congress has added another year’s worth of historic illustrated newspaper pages to the LC Flickr photostream. The New-York Tribune Illustrated Supplement section of 1906, printed on Sundays, includes published images of signature events of 1906, including: construction of the Panama Canal, 3 weeks of coverage on the San Francisco Earthquake, the Chicago meat packing industry, storm devastation in Hong Kong and Alabama and more….In Flickr, you can tag it, add a note, share it….and even read more about it!

Access the Library of Congress Flickr Stream

Access the Chronicling America Database and Directory

See Also: Milestones: Library of Congress, National Endowment for the Humanities Celebrate Millionth Page in Chronicling America Program

See Also: Now Available: Webcast: One Millionth Page in Chronicling America

See Also: New from the Library of Congress: Chronicling America Topic Guides

See Also: Library of Congress Flickr Stream Adds European Images

Source: LC

Now Online: September 2009 Issue of the Library of Congress Digital Preservation Newsletter

Friday, September 11th, 2009

Access Full Issue

Articles Include:

+ Profile of Digital Preservation Pioneer David Riecks

+ An article about recently published white papers on preserving digital legislative data

+ LOCKSS Chief Scientist David Rosenthal speaks at Library of Congress

+ An article about the K-12 Web Archiving Program

+ Library of Congress digital initiatives profiled in Library Journal

+ News of the 2009 SAA annual meeting and Saving Public Policy Web Content meeting

+ Upcoming Events: iPres 2009 and the Cultural Heritage Online Conference

Source: National Digital Information Infrastructure and Preservation Program at the Library of Congress

Another New Web Archiving Service: WAX from Harvard University

Tuesday, September 8th, 2009

A few weeks ago we posted about the new California Digital Library Public Web Archive Service Collections.

Today, via DigitalKoans we learn of another web archiving service named WAX at Harvard University.

From the Web Site:

The public interface for Harvard’s new Web Archive Collection Service (WAX) launched on February 4, 2009. WAX began as a pilot project in July 2006, funded by the University’s Library Digital Initiative (LDI) to address the management of web sites by collection managers for long-term archiving. It was the first LDI project specifically oriented toward preserving “born-digital” material. WAX has now transitioned to a production system supported by the University Library’s central infrastructure.

Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.

WAX was developed as an initial–and only partial–response to these and other concerns, which range from technical feasibility to legal and financial implications. The pilot focused on harvesting content from the surface web–content that is discoverable to search engines through web crawlers, as opposed to content hidden from web crawlers in a database or restricted by password or login protection.

Review the WAX Collections

Much More about WAX from DigitalKoans

Source: WAX, DigitalKoans

Note: Of course, don’t forget about The Wayback Machine from the Internet Archive (IA). It’s now home to over 150 billion archived web pages. The IA also does “custom” web archiving via their very cool Archive-It service.