Archive for the ‘Digital Preservation’ Category

Web Archiving: Administration Wants Help Archiving its Facebook, Twitter Content

Wednesday, September 2nd, 2009

And yet another role for the information professional.

From the Article:

The Executive Office of the President (EOP) plans to hire a company to help archive the ever-expanding amount of data that qualifies as presidential records that the office publishes on publicly accessible Web sites and social networking sites, according to a recently published solicitation notice.

The EOP wants a contractor to capture and store content posted on the sites that the administration is required to maintain under the Presidential Records Act (PRA), the notice said. According to the request for quote (RFQ) notice, the contractor will also be responsible for transferring the captured data to the National Archives and Records Administration (NARA) for historical preservation.

[Snip]

White House officials want the company to crawl and archive PRA content on third-party Web sites where the EOP maintains a presence, such as the White House’s Facebook page and Twitter feed, the notice said. According to the RFQ, the EOP wants data capture to be automatic rather than how it’s currently done.

White House officials want the company to crawl and archive PRA content on third-party Web sites where the EOP maintains a presence, such as the White House’s Facebook page and Twitter feed, the notice said. According to the RFQ, the EOP wants data capture to be automatic rather than how it’s currently done.

EOP officials want to capture the posted content at least twice a day, the notice said. In addition, they said the vendor will have to make the data organized and searchable and provide a Web-based tool that government employees can use to manage the record-keeping.

The data will also need to be stored in a way that will let NARA ingest the records into the agency’s next-generation Electronic Records Archives system.

Access the Complete Solicitation (PDF)

Source: FCW

English Language: A Future for Our Digital Memory: Permanent Access to Information in the Netherlands

Wednesday, September 2nd, 2009

From the Announcement:

A future for our digital memory: permanent access to information in the Netherlands, English-language summary, twenty-page English-language summary of the report of the Dutch National Digital Preservation Survey; Dutch report 1 July 2009.

In order to underpin its strategy, the NCDD decided build a detailed picture of the current situation in the public sector in the Netherlands. Can institutions or domains be identified which have successfully risen to the challenge of digital preservation and permanent access? What categories of data are in danger of being lost? How can the risks be managed? The so-called National Digital Preservation Survey was funded by the Ministry of Ministry of Education, Culture and Science, and was held in the first six months of 2009.

A team of three researchers conducted some seventy interviews with stakeholders in three distinct sectors: government & archives, the research community, and cultural heritage institutions.

Access the Complete English Language Report: A future for our digital memory: permanent access to information in the Netherlands, English-language summary

Source: Netherlands Coalition for Digital Preservation

A Data Deluge Swamps Science Historians

Friday, August 28th, 2009

From the Article:

In a vault beneath the British Library here, Jeremy Leighton John grapples with a formidable challenge in digital life. Dr. John, the library’s first curator of eManuscripts, is working on ways to archive the deluge of computer data swamping scientists so that future generations can authenticate today’s discoveries and better understand the people who made them.

His task is only getting harder. Scientists who collaborate via email, Google, YouTube, Flickr and Facebook are leaving fewer paper trails, while the information technologies that do document their accomplishments can be incomprehensible to other researchers and historians trying to read them. Computer-intensive experiments and the software used to analyze their output generate millions of gigabytes of data that are stored or retrieved by electronic systems that quickly become obsolete.

Source: Wall Street Journal
Hat Tip: ACM Tech News

Experts Discuss Saving Public Policy Web Content

Thursday, August 27th, 2009

From the Post:

Curators and public policy experts representing commercial, academic and non-profit organizations convened for a two-day meeting at the Library of Congress to explore strategies for preserving public policy content that has been made available only on the web.

As more and more of existing public policy content is only available on the web, the challenge of providing enduring access to, and long-term preservation of, public policy information is increasingly complicated. The Library’s National Digital Information Infrastructure and Preservation Program is exploring ideas about how to work with others to preserve this information.

NDIIPP is interested in this area as part of its work to catalyze development of a national collection of digital content though a national network of preservation partners. To date, the Program has engaged over 130 partners from the public and private sectors to work together to develop approaches and solutions for saving America’s digital heritage.

Source: NDIPP / Library of Congress

Portico Announces Digital Preservation Agreement with University of Technology Sydney Library

Tuesday, August 25th, 2009

From the Announcement:

Portico (www.portico.org) is pleased to announce the signing of an agreement with University of Technology Sydney (Australia) Library (UTS Library) to preserve its online collection of 11 e-journals. UTS Library created UTSeScholarship in 2004, an initiative that provides a secure, stable, digital home for the scholarly output of the University’s staff, students, and colleagues with whom they collaborate. UTSeScholarship includes UTSePress, which publishes these scholarly journals as well as books and conference proceedings.

Through this agreement with Portico, UTS Library ensures that the online versions of its journals will be preserved and available for future scholars, researchers, and students. UTS Library has named Portico as a mechanism to fill post-cancellation access claims and has also agreed to make an annual financial contribution to Portico.

See Also: Portico Homepage with the Latest Portico Stats

Source: Portico

Internet Archive Requests Your Help in Preserving GeoCities Materials

Tuesday, August 25th, 2009

From a Blog Post:

Yahoo! announced that it will close the site on October 26, 2009, steering users towards their paid service instead. We have been archiving GeoCities sites for years in our crawls, but, as goes with the territory of being web archivists, we want to make sure to gather as many of the sites as possible before the looming end of an era, 10-26-2009. If you have a page with GeoCities or are a fan of a particular page, please use our special collections page to ensure its preservation. Additionally, please refer to the Archive Team’s efforts to save cultural information that may be lost with the site closing. Yahoo! is also offering valuable advice at their help center.

Source: Internet Archive

Old South Carolina Newspapers to be Digitized, Put Online

Monday, August 24th, 2009

From the Article:

A rich but dusty archive of South Carolina history — newspapers published throughout the state from 1860 to 1922 — will become Web accessible through the S.C. Digital Newspaper Project, an initiative of University Libraries.

The project, funded with a two-year $350,000 grant from the National Endowment for the Humanities (NEH), will scan, enhance, and deliver to the Library of Congress an estimated 100,000 pages of selected S.C. newspaper titles. The resulting online archive will reflect major artistic, literary, religious, ethnic, cultural, economic, and political events in South Carolina and surrounding states.

Source: University of South Carolina

Meet the Digital Preservation Pioneers: David Riecks

Friday, August 21st, 2009

From the Article:

When you take a digital photo, the camera immediately records information about the aperture setting, shutter speed, focal length, metering mode and more. Some cameras have a global positioning system that adds information about where the photo was taken.

But David Riecks (rhymes with “clicks”) – professional photographer, digital-image technologist and metadata evangelist – believes that a digital photo should contain more information about itself. And he is on a mission to spread the word about the current and long-term value of photo metadata.

[Snip]

He became intrigued by keyword search and he developed a home-grown approach to metadata, judiciously adding keywords and descriptive metadata to each digital image. If a client requested a photo of a cow in a pasture, Riecks could search his image database for the keywords “cow” and “pasture” and quickly locate any of his photos that contained those keywords. The system helped streamline his business.

See Also: See the Complete List of Digital Preservation Pioneers

Source: NDIIPP / Library of Congress

Webcast: Ensuring the Longetivity of Digital Documents

Thursday, August 20th, 2009

Yesterday, we posted: Digital Preservation: LOCKSS Chief Scientist David Rosenthal Speaks at Library of Congress. The post contained links to slides and several related documents but not the video of Rosenthal’s presentation.

Today, the video was made available on the LC web site. You can view it here. The video runs 76 minutes.

Source: Library of Congress

Conference Paper: Citizen-Created Content, Digital Equity and the Preservation of Community Memory

Wednesday, August 19th, 2009

The following paper by Penny Carnaby (National Library of New Zealand) and presented by Sue Sutherland (National Library of New Zealand) will be delivered at the upcoming World Library and Information Congress: 75th IFLA General Conference and Assembly in Milan, Italy.

From the Abstract:

While the complex issues concerning the protection and preservation of digital assets are better understood by the information professions, there is still much thinking required about the preservation and protection of the new wave of citizen-created content.

Traditionally information professionals in all types of memory institutions have clearly met the need for, and nature of, the preservation activities around formal and authoritative knowledge services and systems. However, informal, citizen-created knowledge activities are far less straightforward in terms of preservation. These activities arise and evolve as individual citizens develop as authors, content creators, thought leaders, filmmakers, blog diarists, etc. There is at present an extraordinary
unleashing of content creation by individual citizens.

This development challenges established organisational systems and professional practice in an unprecedented way. This paper outlines some of the issues involved in the preservation of digital assets in this new environment. It explores how all memory institutions including archives, galleries, museums and libraries in particular, can value and protect a country’s digital assets in both the formal and informal arena.

Access the Full Text (10 pages; PDF)

Source: International Federation of Library Associations

Digital Preservation: LOCKSS Chief Scientist David Rosenthal Speaks at Library of Congress

Tuesday, August 18th, 2009

From the Summary

When David Rosenthal talks, people listen. They may not always agree with the Chief Scientist of the LOCKSS program based at Stanford University, but they engage with what he has to say.

This was the case on July 27, 2009, when a large crowd gathered at the Library of Congress to hear Rosenthal’s presentation How Are We Ensuring the Longevity of Digital Documents? (pdf, 321 Kb). Rosenthal’s talk was a reprise of his widely-discussed plenary at the Spring 2009 Coalition for Networked Information Task Force meeting. In his introduction at the CNI meeting, CNI Executive Director Clifford Lynch told the audience that Rosenthal’s work had changed his thinking about digital preservation.

+ Rosenthal’s presentation at the Library was filmed and will soon be available as a webcast. Update: The video (76 minutes) is now available.

+ Ensuring the Longevity of Digital Documents? (2009): Presentation Slides (PDF):

Source: NDIIPP, Library of Congress

The “What’s New at the Internet Library” Blog

Thursday, August 13th, 2009

The other day we posted some updated statistics about the many collections from the Internet Archive.

We didn’t mention that a great way to learn about new features, cool collections, etc. is the “What’s New at the Internet Archive” blog posted here.

From the looks of it the blog has about two or three posts a month. The latest posting (posted earlier today) is a lot of fun. It looks at (and provides links to) drive-in movie advertising accessible via the IA.

Almost forgot, if you prefer RSS, no problem. The “What’s New at the Internet Library” blog has an RSS feed accessible here.

Want Internet Archive news? Take a look at their Twitter feed: http://twitter.com/internetarchive

What’s the average lifespan of a Web page?

Wednesday, August 12th, 2009

Marieke Guy does an impressive job pulling together several estimates and the underlying papers where they come from. It’s one challenging question and getting even more so each day in this time of Twitter and similar services.

Here are some of those numbers:

+ 44 days (the average length of a URL) from Brewster Kahle’s 1997, “Preserving the Internet.”

+ 75 days from Michael Day’s Report: Collecting and preserving the world wide web

+ 100 days from a November, 2003 Washington Post article: On the Web, Research Work Proves Ephemeral.

The Internet Archive now gives 44 -75 days as its ball park figure.


Access the full post and underlying documents
.

Source: JISC-PoWR

European Publishers Target Google

Wednesday, August 12th, 2009

From the Article:

Google is facing growing opposition in Europe to its landmark US legal settlement with book publishers and authors, raising a fresh challenge to an agreement that could help determine the future structure of the digital books business.

The article focuses on issues in France, Germany, and from a group of Nordic publishing associations. It also includes comments from Google and Peter Brantley from the Internet Archive.

Source: Financial Times

New Zealand Launches Digital Continuity Action Plan

Tuesday, August 11th, 2009

From the Web Site:

The Digital Continuity Action Plan is a world first initiative [our emphasis] which will prevent important public records being lost and ensure today’s information is available tomorrow. Today most public information is created digitally, but the continuity of that digital information over time has become a real concern. To address this concern the plan has been developed as an all of public sector programme to assist and support agencies overcome issues with storing, accessing, using and reusing the digital information they produce.

Read the Plan (HTML) ||| PDF

Source: Archives New Zealand

K-12 Web Archiving Program

Saturday, August 8th, 2009

From the Article:

Following a successful pilot program during the spring of 2008, the Library of Congress, Internet Archive and California Digital Library initiated a web archiving program that explored archiving websites from the perspective of students in elementary, middle and high schools. Two Library activities supported the pilot: the National Digital Information Infrastructure and Preservation Program and the Teaching with Primary Sources program.

The K-12 Web Archiving Program gives students the opportunity to think about history by selecting sources for ongoing research use. Teens and younger students select and capture web content using Internet Archive’s Archive-It service, creating “time capsules” of what is important to them to represent their current lives.

During the 2008-09 school year, students from ten different schools in nine states participated in the program. Over 1,700 websites and 233 million URLs, or objects, were collected during the year, totaling 11.7 terabytes of data. The Internet Archive noted that 96 percent of the websites selected by students have not been archived by any other Archive-It partner, and 24 percent of the websites are not in the Internet Archive’s general archive. Examples include websites for the Iowa Farm Bureau, Women’s Adventures in Science, and How to Make a Sock Monkey. In total, 68 web collections were created – including a Prom Guide and Historical Black College Search collection – and immediately accessible on the Archive-It website.

Students and teachers alike found the program eye-opening. Student comments included “choosing the websites was really fun because it let everyone be creative and really think about what teenagers enjoy today,” and “I had never thought of archiving websites, even though in this day and age we use them as much as and more than books.” Teacher Emily Patterson of George Washington High School in Charleston, West Virginia said, “I think it was certainly an enriching experience. I like that it allowed them to see and examine their lives and Internet content as history in the making.”

Source: National Digital Information Infrastructure and Preservation Program / Library of Congress

Web Archiving Service Preserves Data for the Future

Saturday, August 8th, 2009

Note: We first posted about this new web archiving service a few weeks ago. Here’s a bit more information.

From the Announcement:

Researchers and scholars now will be able to delve into archived Web sites captured by the California Digital Library’s Web Archiving Service (WAS). This new tool enables faculty, researchers and librarians to capture, curate and preserve Web sites, thus creating permanent archives available to researchers everywhere. The social history of our times is now being preserved in archives as rich and varied as the contentious 2003 California recall election, hundreds of California state Web archives, the Guantanamo Bay Detention Camp Web archive and the Middle East Political Sites archive. New archives continually are being built and published and will appear along with the current archives, available at webarchives.cdlib.org/.

The Web has revolutionized our access to information. Documents and publications that once were difficult to find now are readily available to anyone at any time. Popular reactions to historical events unfold via blogs and personal Web sites, and we have an unprecedented view into popular culture and the formation of public policy. “This is a tool that can track censorship in China, political regimes in Iran, and social commentary around the world,” states Laine Farley, California Digital Library’s executive director. “CDL and the UC libraries are leading the way in building collections for the 21st century.”

Ready access to these publications cannot be taken for granted. Web pages and documents are as easy to change or remove as they are to publish. When sites are redesigned, when new administrations take office, when policies or organizations change, we witness the wholesale disappearance of information. State and local Web publications particularly are at risk. In many cases, these documents no longer are available in print, and libraries are challenged to continue their historic role as cultural memory institutions in the digital environment.

Source: University of California

Pilot Program: Preservation in the Cloud

Friday, July 24th, 2009

From the Report:

The Library of Congress National Digital Information Infrastructure and Preservation Program has launched a pilot program to test cloud technologies for preserving digital content. The pilot will focus on using a new service, DuraCloud, developed and hosted by the DuraSpace Foundation.

Source: LC / NDIIPP

The British Library Publishes Annual Report (2008-2009)

Wednesday, July 22nd, 2009

Access the Complete Report (Flash Version) or Text Only Version

From the Chief Executives Page:

There is enormous public and educational interest in the digitisation of our historic newspapers and we end the year with some three million digitised and fully searchable pages available online. We are now poised to work with a commercial partner to significantly scale up this effort over the coming years. This year we have also completed the digitisation of around 70,000 books and 12,000 recordings.

[Snip]

We have taken significant steps forward in services to support the scientific research community this year. TalkScience events have proved popular: our science collections are supporting a wealth of research – from interpreting our recordings of frog calls and investigating volcanic activity in the 18th century to using our contemporary and rare biomedical journals to inform a study on lactose intolerance. UK PubMed Central is rapidly growing as an open-access database service, with new facilities being added regularly.

Source: BL

California Digital Library Public Web Archive Service Collections Launched

Wednesday, July 22nd, 2009

From the Article:

The California Digital Library has opened its Web Archiving Service collections. Topics in the collection range from California government agencies to middle-eastern politics to natural disasters. The institutions that harvested and curated the websites include New York University, the University of North Texas, Stanford University and several University of California campuses.

Users can browse the public archives by URL or search by keyword. Users can also view changes over time for a given web site if that website was harvested more than once. This feature is especially useful when comparing something like the daily reporting on the 2007 Southern California wildfires or a quick check to see what new documents were added to a site from crawl to crawl.

Direct to California Digital Library Web Archiving Service

Source: National Digital Information Infrastructure and Preservation Program at the Library of Congress