Archive for the ‘Digital Preservation’ Category

The Internet Time Machine from the Momento Project

Tuesday, November 17th, 2009

This is a must read from start to finish. Here are a few snippets to wet you whistle.

Access the Complete Article from New Scientist

Bookmarking a page takes you to its current version – but earlier ones are harder to find (to see an award-winning 1990s incarnation of newscientist.com, see our gallery of web pages past, right). One option is to visit a resource like the Internet Archive’s Wayback Machine. There, you key in the URL of the site you want and are confronted with a matrix of years and dates for old pages that have been cached.

It’s a lot of hassle. But it shoudn’t be, says Herbert Van de Sompel, a computer scientist at Los Alamos. “Today we treat the web like a library in which you have to know how to go and search for things. We’ve a better way.”

That “better way” is a system that gives browsers a “time-travel” mode, allowing users to find web pages from particular dates and times without having to navigate through archives.

[Snip]

“In addition to language and media type, we negotiate in time. So Memento asks the server not for today’s version of this page, but how it looked one year ago, for instance,” says Van de Sompel.

[Snip]

Jakob Voss, a developer with the Common Library Network in Göttingen, Germany, is an early Memento user – and he is already advocating use of Memento for sites with frequently updated pages like Wikipedia.

“Memento is only a proof of concept but it looks very promising and could be a great enhancement to the web. There is little support in today’s browsers for digging into archives, especially those with dynamic content management systems like wikis and weblogs,” Voss says.

You Can Try a Demo Here and Learn More Here

Access the Complete Article from New Scientist

Source: New Scientist

The November/December 2009 Issue of D-Lib Magazine is Now Available

Tuesday, November 17th, 2009

Before we post a selection of what in the new issue of D-LIB, ResourceShelf would like to thank Bonita Wilson for editing a great publication. She has been the sole editor of D-LIB since July, 2001. This is her last issue as editor. She’ll now have more time to engage in the “other things” she likes doing at her home on the Chesapeake Bay in VA. She’ll continue with CNRI in a part time capacity.

Here are Some of the Articles in the November/December 2009 Issue of D-LIB:

+ Beyond 1923: Characteristics of Potentially In-copyright Print Books in Library Collections
by Brian Lavoie and Lorcan Dempsey, OCLC Online Computer Library Center

+ Service-Oriented Models for Educational Resource Federations
by Daniel R. Rehak, LSAL; and Nick Nicholas and Nigel Ward, Link Affiliates, Australia

+ From TIFF to JPEG 2000? Preservation Planning at the Bavarian State Library Using a Collection of Digitized 16th Century Printings
by Hannes Kulovits and Andreas Rauber, Vienna University of Technology; and Anna Kugler, Markus Brantl, Tobias Beinert, Astrid Schoger, Bavarian State Library

+ Measuring Citation Advantages of Open Accessibility
by Samson C. Soong, Hong Kong University of Science and Technology

+ The Importance of Digital Libraries in Joint Educational Programmes: A Case Study of a Master of Science Programme Involving Organizations in Ghana and the Netherlands
by Marga Koelen, International Institute for Geo-Information Science and Earth Observation; and Jonathan Arthur Quaye-Ballard, Kwame Nkrumah University of Science and Technology

The Practice and Perception of Web Archiving in Academic Libraries and Archives
by Lisa Gregory, University of North Carolina at Chapel Hill

Pennsylvania Literary Journal: Google Websites as an Easy Publication Route
by Anna Faktorovich, Indiana University of Pennsylvania

Access the Complete November/December 2009 Issue of D-LIB:

Video: Preserving and Providing Access to Digital Info from State Legislatures

Monday, November 16th, 2009

From an Announcement:

A new video features Minnesota Speaker of the House Margaret Anderson Kelliher talking about new methods to preserve and provide access to digital records of state legislatures. The production describes the work of A Model Technological and Social Architecture for the Preservation of State Government Digital Information Project, which is supported by the Library of Congress National Digital Information Infrastructure and Preservation Program.

Direct to Video (via Minnesota Historical Society)
It runs about six minutes.

Source: National Digital Information Infrastructure and Preservation Program

Milestones: The British Library’s Digital Library Passes 500,000 Items

Friday, November 13th, 2009

From the Announcement:

The British Library has added the 500,000th item to its long-term Digital Library System. The milestone item was a digitised copy of a newspaper originally published in 1864 and scanned as part of the Library’s 19th Century British Library Newspapers project, which recently made more than 2 million pages of historic newspapers available online. [Subscription Required].

[Snip]

Steve Green, Head of the Digital Library Programme at the British Library said: “The task of collecting, preserving and providing long-term access to the nation’s digital assets is in many ways a daunting and complex undertaking. The sheer amount of material being published digitally is challenging enough in itself, but the wide range of different formats – many of which will inevitably become obsolete – makes preservation and future accessibility far from straightforward. The Digital Library Programme has made huge progress in the past few years and we now have the foundations of a robust and fully scaleable system that can handle large quantities of digital items, ensuring their availability for future generations of researchers just as our historic print collections remain available for users today.”

Currently the Digital Library System holds:

+ 386,000 items received through the Voluntary Deposit of Electronic Publications (VDEP) scheme
+ 23,000 British Library Sound Archive master files
+ 65,000 19th century digitised books
+ 2,000 electronic journal items
+ 29,000 newspaper items

Source: British Library

Digital Preservation: Two New Publishers Join CLOCKSS

Monday, November 9th, 2009

From the Announcement:

CLOCKSS is pleased to announce that two new society publishers have recently joined the CLOCKSS archive. The Royal Society of Chemistry and the Royal Society have signed agreements this fall to join CLOCKSS and preserve their materials in the CLOCKSS network of geographically and geopolitically distributed archive nodes. CLOCKSS (Controlled Lots of Copies Keep Stuff Safe) is a community-governed, not-for-profit archive founded by librarians and publishers to ensure the long-term availability of scholarly digital content.

As part of joining CLOCKSS, the two societies agree to release their archived content to the world for free if a time comes when it is no longer available from any publisher (”trigger event”).

Access the Complete Announcement

Source: CLOCKSS

Digital Preservation: ACM Will Partner with Portico and CLOCKSS for Preservation of Its Digital Library Resources

Friday, November 6th, 2009

From an Announcement:

ACM (the Association for Computing Machinery) announced today that it is providing its institutional library customers with advanced electronic archiving services to preserve their valuable electronic resources. These services, provided by Portico and CLOCKSS, address the scholarly community’s critical need for long-term solutions that assure reliable, secure, deliverable access to their burgeoning digital collection of scholarly works. ACM is offering these services to protect the vast online collection of resources in its Digital Library (DL), which are used by over 1 million computing professionals and students worldwide.

“By partnering with Portico and CLOCKSS, we are able to meet a growing demand in the library community for a trusted, reliable third-party archive, and to ensure that digital collections remain accessible to future scholars, researchers, and students,” said Scott Delman, ACM Group Publisher. “Scientific discovery and the educational process are not possible without reliable access to the accumulated scholarship of the past and secure preservation of the scholarly record, and these agreements are a clear step forward with the relationship between the ACM and the library community.”

By investing in long-term digital preservation of content, ACM’s aim is to make it easier for libraries to accelerate their transition away from print and free up resources invested in print collections in favor of new and innovative electronic products and services.

Much More After a Click
(more…)

New Video on Web Archiving

Friday, November 6th, 2009

From the Description:

Web content changes all the time. If we don’t save that content before it disappears, a major part of our cultural history will be lost.

The Library of Congress is working to provide permanent access to web content of historical importance. It selects websites for collection, requests permissions from the website owners, addresses the technology of collecting websites and preserves the websites and makes them available.

This video examines those four challenges.

Access the Video (embedded here)

A text transcript is also available (PDF)

Source: National Digital Information Infrastructure and Preservation Program

Bibliotheca Alexandrina: A Digital Revival

Tuesday, November 3rd, 2009

The Bibliotheca Alexandrina is one busy place. If you want to learn more read on through our highlights but make sure to read the complete article. Our highlights is just a sample of what’s going on.

From the Article:

The International School of Information Science (ISIS) a research institute affiliated with the BA [Bibliotheca Alexandrina], aims at furthering the BA’s goals of being a leading institution in knowledge dissemination and, specifically, promoting research and development related to the digital libraries. Toward that goal, ISIS has embarked on an array of ambitious projects, in partnership with world-class institutions. These include hosting a mirror site for the Internet Archive, participating in the Million Book Project, organizing the digital archive of the Gamal Abdel Nasser collection, digitizing 113 years of Al-Hilal magazine, presenting the first-ever complete digital version of Description de l’Egypte, conducting advanced research such as the Arabic component of the UN-sponsored Universal Networking Language computerized multi-language translation program, and offering the most advanced 3D virtual imaging techniques in a virtual immersive environment for science and technology applications. Thus, despite being barely seven years in existence, the BA already has a substantial record of achievements.

Among the other projects you’ll read about are:

+ The Digital Assets Repository (DAR)

+ Memory of Modern Egypt Digital Repository

+ Archive documenting the history of the Suez Canal

+ SuperCourse

To empower science educators worldwide, the BA is working with a team of specialists, in partnership with the University of Pittsburgh, to launch the first science SuperCourse, comprising thousands of PowerPoint lectures made available for free to teachers and lecturers, who can use the lectures as they see fit in their teaching of science. The SuperCourse has been effectively implemented in the area of Public Health and Epidemiology, with a network of 65,000 scientists in 174 countries, providing more than 3,500 lectures in 31 languages. The BA maintains a mirror site of SuperCourse, which receives an average of one million hits per month, and is working on setting up a similar course in all fields of science.

Much More in the Complete Article

Source: EDUCAUSE Review
Hat Tip: OAN

GeoCities Says So Long as Internet Archive Works to Preseve Content

Tuesday, October 27th, 2009

In August, we first posted about the Internet Archive (IA) asking GeoCities users to make sure their content was archived by the IA. Why? As of yesterday, GeoCities is no longer online.

From the Article:

Yahoo, which acquired the site for $3.57bn (£2.17bn) in 1999 at the height of the dotcom boom, said sites would no longer be accessible from 26th October.

However, many of the pages have been archived and will still be available to view via the nonprofit Internet Archive project.

The giant digital library, which has been archiving the public web since 1996, has set up a special project to archive GeoCities before it is lost forever.

“We’ve collected a lot of GeoCities sites over the years – but might not have every site and every page,” the Internet Archive said.

Access the Complete Article

Source: BBC

See Also: Saving a Historical Record of GeoCities (via Internet Archive)

Library of Congress’ National Digital Information Infrastructure and Preservation Program Wins Government Computing News Award

Saturday, October 24th, 2009

The NDIIPP as one of 11 projects to receive GCN [Government Computing News] Award for Agency IT Achievement.

From the Summary:

It took two centuries for the Library of Congress to acquire its 29 million books and 105 million other items. Today, it only takes 15 minutes for the world to produce an equal amount of information in digital form, creating unprecedented archiving challenges for the Library of Congress. The Library is meeting the challenge of digital preservation by developing new tools to transfer large quantities of digital content. To date, more than 3 million files have been transferred and stored using the BagIt specification. Due to the Library’s digital preservation initiatives, more than 1,000 collections of digital content have been selected, captured, preserved, and made available to the U.S. public and online visitors across the globe.

Access the Complete Article

We are warned to be careful about what we put online because data on the Internet lives forever. But keeping random copies of files on servers, routers and databases is not the same as preservation, said Martha Anderson, director of program management for the Library of Congress’ National Digital Information Infrastructure and Preservation Program. Digital data can be ephemeral. “That is the paradox,” she said.

Much More in the Summary and Complete Article

Source: GCN

See Also: Library of Congress News Release

Getting to Know the HathiTrust Digital Library

Friday, October 23rd, 2009

Barbara Quint Writes:

With all the controversy still swirling around Google Books and its post-settlement offerings, an alternative route to the millions of digitized books and journals supplied by leading Google Book Search library partners has arrived. The HathiTrust (www.hathitrust.org) is a collaboration of 25 research libraries already participating in Google Book Search to produce a shared digital repository for preservation and access to a curated collection. By mid-November, the HathiTrust Digital Library will have a full-featured, full-text search service for 4.3-5 million items. The searches will retrieve bibliographic citations and page references, including those for in-copyright books. Content will extend beyond the digitized copies of books returned to early library partners by Google. HathiTrust is pushing to acquire other digitized special collections from its members, as well as making arrangements for opening access to university press books.

[Snip]

The new launch will open indexing to nearly 1.5 billion pages from well more than 4.3 million volumes with full-text searching by keyword or phrase. (Just between us, if you simply cannot wait until mid-November, go to

http://babel.hathitrust.org/cgi/ls.

[John] Wilkin, [associate university librarian at the University of Michigan and executive director of the HathiTrust], tipped me off that, [our emphasis] although this “experimental search” site claims to search only 500,000 documents, it actually includes the full 4.3-5 million volumes. Feedback options appear at the top and bottom of each search results page.) The system already had the equivalent of library cataloging searching, though they expect to upgrade even that kind of searching under a cooperative program with OCLC.

Much More in the Complete Article

Source: InfoToday NewsBreaks

Article: Missing Links: The Enduring Web

Thursday, October 22nd, 2009

From the Abstract:

The Web runs at risk. Our generation has witnessed a revolution in human communications on a trajectory similar to that of the origins of the written word and language itself. Early Web pages have an historical importance comparable with prehistoric cave paintings or proto-historic pressed clay ciphers. They are just as fragile. The ease of creation, editing and revising gives content a flexible immediacy: ensuring that sources are up to date and, with appropriate concern for interoperability, content can be folded seamlessly into any number of presentation layers. How can we carve a legacy from such complexity and volatility?

Access the Complete Article (PDF)

Source: International Journal of Digital Curation (4.2)

Preserving Internet Content

Tuesday, October 13th, 2009

From the Web Site:

On October 7, 2009, the IIPC [International Internet Preservation Consortium] sponsored a free, one-day event, Active Solutions for Preserving Internet Content, following iPRES 2009, the 6th International Conference on Preservation of Digital Objects, held at the Mission Bay Conference Center, San Francisco. Slide presentations are available on the conference program page.

Presentations with Slides Include:

+ Billions and billions of objects, METS, PREMIS, oh my! (Gina Jones)

+ Preserving Access-Making more informed guesses about what works (David Pearson)

+ “Here be dragons” – Strategies for dealing with viruses in the web archive (Matt Holden)

+ Say Emulate; He Says Migrate (David Pearson)

+ Keep Websites Alive (Jeffrey van der Hoeven)

+ What do web archivers (or is it archivists) really do? (Gina Jones)

+ Web Archives Are Forever: defining a workflow for long term preservation of web archives (Maureen Pennock)

+ Square pegs? Fitting web archives into the digital preservation repository of the National Library of New Zealand (Kevin De Vorsey)

+ Continuity and Preservation: The National Archives approach to maintaining permanent access to the web presence of UK Central Government
(Amanda Spencer and Alison Heatherington)

+ It’s the end of a project, as we know it: a leading discussion on experiences and issues in embedding web archiving and preservation in an organization (Marcel Ras and Hilde van Wijngaarden)

Source: netpreserve

NDIIPP Releases Web Archiving Video

Friday, October 9th, 2009

From the Story:

Web content changes all the time. If we don’t save that content before it disappears, a major part of our cultural history will be lost.

This is the message of the second video in the Library of Congress National Digital Information Infrastructure and Preservation Program’s video series. The just-released video, “Web Archiving,” discusses the Library’s approach to collecting and preserving content found on the World Wide Web.

The three-minute video is targeted to librarians, archivists, and others interested in working with digital content.

[Snip]

The “Web Archiving” production is the second in the series, following the Bagit video that was released in July 2009. The Bagit video describes a specification for securely transferring digital content.

View the Web Archiving Video

Video Presentations Homepage

Source: National Digital Information Infrastructure and Preservation Program, Library of Congress

A Look at the Major League Baseball Video Library Film Archive

Thursday, October 8th, 2009

If you’re a baseball fan, this is a “must read.”

From the Article:

No American sport has a past as deep and cherished as baseball’s. But precious little of the sport’s history is preserved in moving images. Much occurred before the television age, leaving only grainy, scattershot clips culled from newsreels and home movies — and rarely does it show a player of [Babe] Ruth’s stature.

The newly arrived Ruth film is part of the video collection of Major League Baseball Productions, the league’s official archivist, which spans more than 100 years and includes about 150,000 hours of moving images. Most of the collection is stored in plastic cases that line metal shelves of a room labeled “Major League Baseball Film and Video Archive.” The overflow rests in storage a few miles away, in Fort Lee, N.J.

The article goes on describe how Frank Caputo, manager of the MLB Network video library film archive and Joe Porciello research a newly discovered 8-millimeter clip (it was found by a New Hanpshire man in his grandfathers home movie collection).

Source: The New York Times

See Also: Just in Time for the Major League Playoffs and World Series: Baseball Resources at the Library of Congress Web Guide

On Google and Usenet

Wednesday, October 7th, 2009

The article begins with one paragraph about Google Book Search but the story actually focuses on the Usenet archive (Google Groups).

From the Article by Kevin Poulsen:

…a few geeks with long memories remember the last time Google assembled a giant library that promised to rescue orphaned content for future generations. And the tattered remnants of that online archive are a cautionary tale in what happens when Google simply loses interest.

That library is Usenet, a vast internet- and dial-up-based message board system erected in 1980. Though moribund today, for decades Usenet was the paper of record for the online world, and its hundreds of millions of “newsgroup” postings chronicle everything from the birth of the web to the rise of Microsoft, as well as more trivial matters.

In February 2001, Google rescued that history when it acquired the New York-based Deja.com, and with it a Usenet archive going back to 1995. It turned the archive into Google Groups, in a move that was cheered by net geeks who had seen Deja’s reliability declining, and were certain that the supremely competent Google would save it.

[Snip]

Flash forward nearly eight years, and visiting Google Groups is like touring ancient ruins.

[Snip]

Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.

[Snip]

“The search results are extremely poor,” says network pioneer Brad Templeton. “Like nobody cares.”

Henry Spencer, whose Usenet archive forms much of Google Groups, is troubled by the company’s curatorship. “Google does get a lot of credit for putting it together and making it available,” Spencer says. “But search capabilities are important for such a large collection of data. The archive’s value to the community is considerably reduced if it’s not conveniently searchable.”

Source: Wired

Legal Delays Have Blown a Hole in UK’s Digital Heritage

Monday, October 5th, 2009

From the Article:

Digital literature, online scientific research and internet journalism that should have been saved in the nation’s main libraries over the past five years may have been lost because ministers have failed to give them the legal power to copy and archive websites, the Guardian has learned.

Lost digital archive: ‘It’s taken 6 years to begin consultation’ Link to this audio Senior executives at the British Library and the National Library of Scotland (NLS) are dismayed that legislation giving them the right to collect online and digital material is still not in force, more than six years after it was passed by parliament.

The omission has meant the libraries – which are legally required to archive books, newspapers and journals – have failed to record online coverage of major events such as the Iraq and Afghanistan wars, the release of the Lockerbie bomber and the MPs’ expenses scandal.

[Snip]

Phil Spence, head of operations at the British Library, said the failure had left a major “digital black hole” in the library’s collections, with huge gaps in the archives for researchers, scientists and historians.

It meant the British Library was unable to store the BBC’s website, the National Gallery or British Museum website, any UK newspapers’ websites, or scientific journals published online because of copyright issues. Blogs, community pages, government and business websites can only be archived after laborious voluntary agreements. The act would protect the libraries against copying defamatory material, but would also protect a publisher’s copyright.

“We’ve lost five years of digital content which is gone potentially for ever, and the ability of the nation to capitalise on that as well,” he said.

Much More in the Full Text Article Including a 3.5 Minute Audio Report

Source: The Guardian

Report From Digital Preservation Workshop Held in DC

Monday, October 5th, 2009

From the Report:

Over twenty Library of Congress staff had an opportunity to participate in a special workshop, Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems, hosted by the Inter-university Consortium for Political and Social Research, held September 21-22, 2009 in Washington, DC.

Initially developed at the Cornell University Library and supported with funding from the National Endowment for the Humanities, the Digital Preservation Management workshops are structured curricula geared toward managing digital preservation planning and policies for libraries, archives, and other cultural heritage institutions. The goal of the workshop is to provide those managers and staff responsible for digital assets the practical means to exercise stewardship in an age of technological change. Many institutions struggle with the initial stages of developing digital preservation policies, and the workshop aides participants in understanding the fundamental pieces of how to think about and enact planning for organizations.

[Snip]

The next five-day workshops will be held October 11-16, 2009 at the University of Michigan – where Martha Anderson, director of program management for the National Digital Information Infrastructure and Preservation Program, will be the keynote speaker – and June 13-18, 2010 at MIT in Cambridge, Massachusetts. For more information about the workshops, please visit: www.icpsr.umich.edu/dpm/workshops/fiveday.html.

Source: National Digital Information Infrastructure and Preservation Program / Library of Congress

NDIIPP Conducts Two Day Workshop on Preserving Digital News

Friday, October 2nd, 2009

From the Post:

The Internet has impacted news and journalism more than almost any other category of information. Newspapers have always been important research resources for users of libraries, archives and historical societies. But significant events are now reported in new ways, such as through blogs, podcasts, social-networking services, online news aggregators and multimedia web content. To address this change, the National Digital Information Infrastructure and Preservation Program convened a two-day workshop to discuss a national strategy for collecting and preserving news content that is disseminated only in digital form.

The meeting on September 2-3, 2009, brought together over fifty invited specialists in the field: creators, distributors, archivists, and researchers who depend upon historical news. The topics for discussion included the following:

+ What is digital news? Who produces it? What forms does it take?
+ What is important to preserve for the nation?
+ What collaborative efforts for preservation are succeeding now?
+ What are the roles for content owners and public archives in preserving digital news?
+ What roles do “local” and “national” content and organizations serve?
+ What are some strategies and possible models for addressing the issues in a distributed way?

A number of lively conversations among the diverse participants prompted several innovative solutions, including blogs that self-archive and newspapers that opt-in to public institution web archiving. Case studies and analyses of how historical news is consumed and used, especially with regard to dynamic and multi-media content, were suggested. Local news blogs were deemed an important area to monitor as they seem particularly at-risk and ripe for a distributed solution.

A Bit More in the Complete Article

Source: National Digital Information Infrastructure and Preservation Program (NDIIPP) / Library of Congress