Archive for the ‘Digitization Projects’ Category

The Complete Archive of National Geographic Magazine on Six DVD’s

Sunday, November 1st, 2009

Every now and then a fee-based product comes around that we believe deserves your attention. The following is one of them.

Chris Pendleton on the Bing Blog reminds us that a major digitization project, every issue ever published of National Geographic from 1887-2008, is now available (it was officially released yesterday according to this media announcement) on 6 DVD’s or an external hard drive. That’s right, all of the writing, the legendary imagery, the supplement , even the advertisements are included. For many topics, Nat Geo magazine is a resource that documents people, places, and events, on a global scale. In other words, for all of the reasons just mentioned and many others, makes the magazine an important part of the historical record.

By the way, the reason it was mentioned on the Bing Blog is because Bing is providing some the technology that powers the digitized version this recently released collection.

From the Blog Post

Nat Geo uses Bing Maps in their Geobrowse functionality which allows you to browse a map anywhere in the world to find locations where relevant articles are referenced using geographic metadata.

Yes, we still love paper and those massive collections of past issues of the print version of National Geographic Magazine many people own (where are yours)? They’re also important.

That said, we also hear and read that for today’s student, it’s all about digital access. Yes, of course, that’s rather sad. However, a digitized archive of this size and scope can truly demonstrate the power of digital info technology for people of all ages and turn 120 years of content into important research and learning resources.

Another digitized archive of the magazine was released seven years abut this 120 year collection is the most complete version ever published with more content, more search options, saving/sharing tools, interactive maps, and more. One thing we noticed right of the bat is the that the new version is available for both PC and Mac. The “112 year version” was PC only.

Here are a few fast facts about the new collection. They were gleaned from Nat Geo site (including the video overview) and news release.

+ All Issues from October, 1888-December, 2008 are included

+ Six DVD’s include more than 200,000 pages; 300 wall map supplements, more than 8,400 articles; more than 250,000 photographs

+ All images scanned in high-resolution

+ Flip one page at a time, zoom, print

+ Geobrowse

A new Geobrowse function powered by Bing Maps that allows users with Internet access to search nearly 5,000 locations on a globe that are featured in the magazine’s archive of articles and maps.

+ Search by keyword, date, contributor, and topic; refine by date or content type

+ Browse by month or year

+ Create personalized reading lists; share these lists with other users in the Nat Geo community

+ Pre-loaded “favorite article lists” compiled by experts

National Geographic is selling the DVD’s for $69.95/US and the hard drive version for $199.95/US.

The lowest price we found as of Sunday November 1st was $42.78 from an Amazon.com Merchant. The DVD’s directly from Amazon.com are $44.99/US.

We’ve ordered a copy of the DVD’s and after spending some time with them we will report back.

Cornell University Library Publishes New Digitization Manual

Friday, October 30th, 2009

Our friends at TeleRead.org let us know about a new digitization manual from Cornell University Library.

From the Announcement:

“Copyright and Cultural Institutions: Guidelines for Digitization for U.S. Libraries, Archives, and Museums,” a new book published today by Cornell University Library, can help professionals at these institutions answer that question.

Based on a well-received Australian manual written by Emily Hudson and Andrew T. Kenyon of the University of Melbourne, the book has been developed by Cornell University Library’s senior policy advisor Peter B. Hirtle, along with Hudson and Kenyon, to conform to American law and practice.

The development of new digital technologies has led to fundamental changes in the ways that cultural institutions fulfill their public missions of access, preservation, research, and education. Many institutions are developing publicly accessible Web sites that allow users to visit online exhibitions, search collection databases, access images of collection items, and in some cases create their own digital content. Digitization, however, also raises the possibility of copyright infringement. It is imperative that staff in libraries, archives, and museums understand fundamental copyright principles and how institutional procedures can be affected by the law.

“Copyright and Cultural Institutions” was written to assist understanding and compliance with copyright law. It addresses the basics of copyright law and the exclusive rights of the copyright owner, the major exemptions used by cultural heritage institutions, and stresses the importance of “risk assessment” when conducting any digitization project. Case studies on digitizing oral histories and student work are also included.

The rest of the news release provides background about each of the Peter Hirtle and Anne R. Kenney, the authors of the manual.

Access
The manual is available for purchase $39.95 from CreateSpace.

You can also download the entire book for free by visiting the Social Science Research Network and the eCommons@Cornell.

Source: Cornell University Libraries
Hat Tip: TeleRead

The Library of Congress Unveils API for Chronicling America Digitized Newspaper Database and Directory

Friday, October 30th, 2009

What follows is a post that might be of special interest to web developers, webmasters, site owners, or anyone who can work with an API (Application Programming Interface), It comes from a digitized collection of more than 1 million historic newspapers and a searchable directory of newspaper info. Even if you are don’t have the technical skills required, it’s possible you know someone who does and with their help you can partner to develop new resources, create mashups, etc. Btw, if you know of people who are able to work with an API, feel free to share this post with them.

First, some background.

We’ve posted about the CA program since the day it launched in March, 2007. The project is a joint effort between the Library of Congress and the National Endowment for the Humanities to digitize historic American newspapers. In addition to the digitized newspaper database CA also provides Chronicling America directory. It’s both searchable with a powerful interface (a great example of what good metadata can do) and browsable. The directory contains information about most American newspapers published from 1690 to today.

On June 16, 2009, we ran a story about CA reaching a milestone. CA had just hit the one million digitized pages mark. It has grown a lot since then. About five weeks ago we posted an item about CA adding more than 192,000 pages to CA. The media release said the size of the database at that time contained 1,442,000 digitized pages from 171 titles, that were published between 1880 and 1922.

Thanks for the info but what about the API (Application Programming Interface) ?

The following from the “About the Chronicling America API” web page:

Chronicling America provides access to information about historic newspapers and select digitized newspaper pages. To encourage a wide range of potential uses, we designed several different views of the data we provide, all of which are publicly visible. Each uses common Web protocols, and access is not restricted in any way. You do not need to apply for a special key to use them. Together they make up an extensive application programming interface (API) which you can use to explore all of our data in many ways.

The rest of the web page offers technical details about the API.

Programmable Web has also posted about the new API.

Here are a couple of highlights:

Search results are available on the web site appear with terms highlighted. The API does not have access to highlight information, but it does contain thumbnails. Each page has a permalink back to the Library of Congress site, which displays the page in a zoomable, draggable viewer similar to Google Map.

The Library of Congress is focused on making these public domain works widely available. As such, this is an API without any registration or key necessary. That’s pretty wide open.

Among the interesting technical details is that the API can return linked data via RDF. It’s good to see reference sites, especially government ones, support semantic web formats (there are now 20 APIs in our directory with RDF support.)

Sources: Library of Congress, Programmable Web
Hat Tip: Dan C.

New Report: Digitisation of special collections: Mapping, assessment, prioritisation

Friday, October 30th, 2009

From the Executive Summary:

Traditionally, digitisation has been led by supply rather than demand. While end users are seen as a priority they are not directly consulted about which collections they would like to have made available digitally or why. This can be seen in a wide range of policy documents throughout the cultural heritage sector, where users are positioned as central but where their preferences are assumed rather than solicited. Post-digitisation consultation with end users is equally rare. How are we to know that digitisation is serving the needs of the Higher Education community and is sustainable in the long-term?

[Snip]

Key Findings

+ The communities of both intermediary and end users are willing to express their view on prioritising digitisation of special collections; the participation in the project was a matter of good will and the good response (see p. 25) makes evident that there is definitely interest of the professional communities to express their opinion on the matter of digitisation needs. It should be noted here that the community of intermediaries sees collections on a finer level of granularity; end users often refer to super-collections such as the holdings of an institution

+ The top user-driven priority criteria that emerged from consultation with both intermediaries and end users are: Improve access; Enhance impact on research and/on studies; Enhance impact on teaching; Allow for collaboration; Improve access outside

+ The geographic and institutional boundaries of collections nominated for digitisation are wider – this study was aimed at the higher education institutions in the UK, but 14% of the nominated collections were from institutions outside of the higher education sector, and 6% were from overseas (see p. 27)

+ The complementarity of collections is strongly favoured by both users’ communities (see section 5)

+ The criteria for digitisation nominated by intermediary and end users include general criteria but also a number of criteria where metrics can be applied; thus allowing to establish a ranking mechanism (see p. 45

Access the Complete Report (62 pages; PDF)

Access the Final Report Appendices (94 pages; PDF)

Source: JISC, Research Information Network

Open Book Alliance Co-Founder Peter Brantley Visits Spain to Talk About the Alliance and Google Book Search

Friday, October 30th, 2009

Brantley is attending meetings in Spain and discussing the OBA and Google Book Search. He’s been interviewed by two newsapers, El Pais and Publico.es.

Here are links to both interviews in Spanish along with mechanically generated translations from two services.

1) “Google no ve libros, se limita a ver datos” (via El Pais)

+ Translation by Google: “Google does not see books, is limited to viewing data” (via El Pais)

+ Translation by Systran: “Google does not see books, is limited to see data” (via El Pais)

2) El bibliotecario que se enfrentó a Google (via Público.es)

+ Translation by Systran: “The Librarian Who Faced Google” (via Público.es)

+ Translation by Google: “The librarian who challenged Google” (via Público.es)

New Project Report: Newspaper Digitisation: British Newspapers 1620-1900

Tuesday, October 27th, 2009

From the Summary:

This report describes all of the stages and issues that occurred during a second complex mass newspaper digitisation project. The project was an innovative and challenging example of a public/private partnership between Gale Cengage Learning, CCS and the British Library.

Access the Executive Summary

Access the Complete Report (57 pages; PDF)

Source: JISC

See Also: Newspaper Digitisation News from the British Library: £33m Saves the World’s Greatest Newspaper Collection for the Nation

See Also: Video and Slides Available from OCR for the Mass Digitisation of Textual Materials Workshop

Electronic Frontier Foundation and Other Groups Send Letter to Judge in Google Book Search Case

Friday, October 23rd, 2009

From a Blog Post:

EFF today led a coalition of authors, publishers, companies and nonprofit organizations in sending a letter to the judge overseeing the Google Book Search settlement urging the Court to ensure that those concerned about the settlement receive adequate notice of, and have sufficient time to study and comment on, any amended settlement agreement that Google, the Authors Guild, and the Association of American Publishers present.

Those following the twists and turns of the Google Book Search settlement will recall that the original Fairness Hearing scheduled for October 7, 2009, was put off because of what the Court called: “significant issues, as demonstrated not only by the number of objections, but also by the fact that the objectors include countries, states, non-profit organizations, and prominent authors and law professors.” The Court received over 400 submissions about the settlement, including the EFF-led coalition of authors and publishers concerned about reader privacy, as well as significant concerns raised by the Department of Justice.

Read the Complete Letter Sent to the Judge Denny Chin (4 pages; PDF)

The letter was signed by a large group of people and organizations including:

+ The Open Book Alliance*
+ Amazon.com
+ The Picture
+ Archive Council Of America
+ National Writers Union
+ Electronic Frontier Foundation
+ Pamela Samuelson (UC Berkeley Law Professor)
+ Microsoft
+ Washington Legal Foundation
+ The Internet Archive
+ Consumer Watchdog
+ Lyrasisk, Nylink and Bibliographical Center for Research Rocky Mountain, Inc.
+ Public Knowledge
+ Urban Libraries Council

+ The Special Libraries Association and the The New York Library Association are two of the members of the Open Book Alliance.

Source: Electronic Frontier Foundation

Getting to Know the HathiTrust Digital Library

Friday, October 23rd, 2009

Barbara Quint Writes:

With all the controversy still swirling around Google Books and its post-settlement offerings, an alternative route to the millions of digitized books and journals supplied by leading Google Book Search library partners has arrived. The HathiTrust (www.hathitrust.org) is a collaboration of 25 research libraries already participating in Google Book Search to produce a shared digital repository for preservation and access to a curated collection. By mid-November, the HathiTrust Digital Library will have a full-featured, full-text search service for 4.3-5 million items. The searches will retrieve bibliographic citations and page references, including those for in-copyright books. Content will extend beyond the digitized copies of books returned to early library partners by Google. HathiTrust is pushing to acquire other digitized special collections from its members, as well as making arrangements for opening access to university press books.

[Snip]

The new launch will open indexing to nearly 1.5 billion pages from well more than 4.3 million volumes with full-text searching by keyword or phrase. (Just between us, if you simply cannot wait until mid-November, go to

http://babel.hathitrust.org/cgi/ls.

[John] Wilkin, [associate university librarian at the University of Michigan and executive director of the HathiTrust], tipped me off that, [our emphasis] although this “experimental search” site claims to search only 500,000 documents, it actually includes the full 4.3-5 million volumes. Feedback options appear at the top and bottom of each search results page.) The system already had the equivalent of library cataloging searching, though they expect to upgrade even that kind of searching under a cooperative program with OCLC.

Much More in the Complete Article

Source: InfoToday NewsBreaks

China: Google Responds to Complaints Regarding Copyright Issues

Friday, October 23rd, 2009

It was just a few days ago when we posted that the China Written Works Copyright Society (CWWCS) was not happy with Google over copyright issues stemming from Google Book Search.

Today, in another Wall Street Journal blog post, we learn that Google has responded to CWWCS.

From the Post:

Here is the latest from Google:

“Today we have more than 50 Chinese publishers participating in Google Book Search, who together have authorized more than 30,000 books to be found through Google web search–and made available through a short preview. We also have some Chinese books that have been scanned by our Book Search library partners; in those cases, we only make the books available as a short snippet of text–as we do with web search–unless the rightsholder authorizes a greater use. We also honor rightsholders’ preferences if they ask not to be included.”

“Like all rightsholders, Chinese authors and publishers will be able to tell Google whether or not to display their books, and will be paid if the books are included in sales or subscriptions authorized under the settlement.”

Source: WSJ

See Also: Here’s How The Story Was Reported in the China Daily
Hat Tip: James Grimmelmann, The Laboratorium

Google Book Search: Video from D for Digitize Conference is Now Available Online

Friday, October 23rd, 2009

A few weeks ago the D for Digitize Conference took place. It was sponsored by the New York Law School and organized by Professor James Grimmelmann. The focus of the conference was Google Book Search (GBS). The list of speakers/panelists reads like a Who’s Who of people representing all sides of the many issues being debated at the conference and elsewhere.

Now, you can watch each session online (free). Even two pre-conference tutorials are included. A list of sessions and speakers along with links to the videos can be accessed here.

Finally, if you want to read about what was discussed during a session before viewing the video or just don’t have time to watch, no worries.
Peter Hirtle from the Law Library Blog provide excellent text summaries of each session.

Law Library Blog is a co-production between Peter and Mary Minow.

See Also: Law Library Blog also has a Twitter feed at:
http://twitter.com/librarylaw

Washington University: Libraries receive federal grant to digitize pre-war slave lawsuits

Wednesday, October 21st, 2009

Here’s more about a very brief item we posted when IMLS National Leadership Grants at the end of September.

From the Article:

Washington University Libraries received one of the largest grants in the institution’s history, a $376,426 National Leadership Grant from the Institute of Museum and Library Services. The money will fund the St. Louis Freedom Suits Legal Encoding Project, which aims to digitize pre-Civil War lawsuits that slaves brought against slaveholders in the St. Louis Circuit Court.

[Snip]

The newly funded Freedom Suits Legal Encoding Project takes the digitalization process a step further. In addition to finishing the scanning of more than 20,000 pages of city directories and court records, the project also seeks to transcribe the documents to enable full-text searches.

[Snip]

The primary novel aspect of this project is to “develop extensions to the Text Encoding Initiative (TEI) for encoding legal documents to reflect legal function, genres and roles, and employ these extensions in this collection,” according to a grant announcement.

In other words, this project seeks to develop a computer language for annotating the legal functions of documents. This language would be comparable to HTML, which is used to denote structural semantics for Web pages. Ultimately, this innovation will be integrated into TEI, the existing language, to provide a model for similar archives.

Access the Complete Library

Source: Student Library (Washington University, St. Louis, MO)

Google Books Settlement: The Chinese Chapter

Tuesday, October 20th, 2009

From a Blog Post:

The China Written Works Copyright Society (CWWCS) has called on Chinese writers to stand up for their legal rights in the face of Web search giant Google’s proposed book settlement, according to a post published on the official Web site of Chinese Writers’ Association (CWA).

CWWCS claimed to have found copyrighted works written by a number of Chinese writers scanned and posted to Google’s digital library, Google Books.

[Snip]

A Google spokeswoman said, “Google Books promotes and encourages book sales – helping to ensure that authors and publishers are rewarded for their creative efforts. Our goal remains bringing millions of the world’s difficult-to-find, out-of-print books back to life. … The scope of our U.S. settlement is limited to the U.S. and comes under U.S law and only U.S. readers will benefit. Of course, we listen carefully to all concerns and will work hard to address them.”

A Google FAQ on its book program gave more detail about how compensation works for non-US authors:

“Holders of U.S. copyrights world-wide can register their works with the Book Rights Registry and receive compensation from institutional subscriptions, book sales, ad revenues and other possible sources, as well as a cash payment if their works have already been digitized. For example, a foreign author whose book was published outside the U.S. can register with the Book Rights Registry, and receive compensation, if that book is in the collection of a U.S. library from which it was digitized.”

Much More in the Complete Blog Post

Source: China Journal (via Wall Street Journal Digits Blog)

Major Digitization Program Announced: NYU Announces Plans to Digitize All Holdings of Bobst Library

Monday, October 19th, 2009

UPDATE: 10/21: Library Journal reports that the article in the Washington Square News the other day, that’s to linked to below is inaccurate.

Josh Taylor, senior director, communications, NYU Abu Dhabi responds to what he told and is quoted saying in the article. He notes the conversation with the reporter was done using e-mail. Taylor’s comments appear on a Chronicle of Higher Education weblog).

This is a case of a reporter being 100-percent accurate with a quote, but drawing a wholly different conclusion than what she was actually told (or in this case, e-mailed).

The final part of my quote [in the article below] is critical to understanding our long-term thinking on the subject. Digitizing for digitization’s sake isn’t a sound academic or economic strategy. However, digitizing as we identify specific curricular and research needs that would benefit from students and faculty being able to access materials in a city other than their own is an essential component of NYU’s vision for the global network university.

NYU Library dean Carol Mandel today told LJ that “our plan, pending more approvals, is to do some significant selected appropriate digitization projects.” She goes on to tell Library Journal that the number one priority of a digitization project is that they meet “curricular and research needs.” Mandel also lists other factors that include, not digitizing content that has been digitized elsewhere, accessibility issues, and having permissions in order.

While Mandel was not ready to discuss the size or cost of the project, she said that it was not, as implied in the article, mass digitization but rather “more akin to digital collection development.” Small projects, she said, could begin this year or next.

From the Article:

————-

With the financial backing of Abu Dhabi, NYU is planning to digitize Bobst Library.

This will be perhaps NYU Abu Dhabi’s most visible change for the university’s Washington Square campus. A digital database of all the holdings in Bobst would serve to connect Abu Dhabi and New York’s research materials.

[Snip]

“We do plan on the future digitization of materials at Bobst, for access by those in Abu Dhabi, and elsewhere in the global network university, as curricular and research needs demand it,” NYUAD spokesman Josh Taylor wrote in an e-mail.

[Snip]

No other university has a completely digitized library, though many universities have made partial steps toward digitization, usually beginning with rare collections.

Kirtas Technologies, a leading company in digitization services, has worked to digitize portions of libraries at Yale and Cornell universities and at the University of Pennsylvania.

While working on a Microsoft-funded project at Yale and Cornell, Kirtas was digitizing three million pages a month — to the tune of 10 or 12 cents per page, according to Marketing Operations Manager Todd Whiting. That adds up to approximately $3 to $4 million a year.

NYU currently has no time frame for when the project will start. The university’s libraries have a combined 5.1 million volumes.

[Snip]

The large-scale digitization projects at Yale and Cornell did encounter some hiccups. According to Whiting, Kirtas had to develop a completely new machine to digitize the pull-out maps and diagrams in many of the rare books at the two universities.

Access the Complete Article

See Also: The NYU School Paper, Washington Square News Has Published an Editorial About the Project titled, “Ditigizing Bobst is Smart Yet Risky.”

From the Editorial:

The WSN Editorial Board thinks the benefits of a digitized library are innumerable. Access to all Bobst stacks though an online, searchable database will help all students and faculty, and the project shows that NYU is on the forefront of embracing technology in academia.

However, some members of our board feel a sense of uneasiness over the project’s funding. The Abu Dhabi government will cover 100 percent of the costs in digitizing Bobst, just as it has covered all costs in building NYU’s campus in Abu Dhabi.

Source; Washington Square News
Hat Tip: Jerome McDonough

European Commission Puts Challenges of Books Digitisation for Authors, Libraries and Consumers on EU’s Agenda

Monday, October 19th, 2009

From the Announcement:

The European Commission today adopted a Communication on Copyright in the Knowledge Economy aiming to tackle the important cultural and legal challenges of mass-scale digitisation dissemination of books, in particular of European library collections. The Communication was jointly drawn up by Commissioners Charlie McCreevy and Viviane Reding. Digital libraries such as Europeana will provide researchers and consumers across Europe with new ways to gain access to knowledge. For this, however, the EU will need to find a solution for orphan works, whose uncertain copyright status means they often cannot be digitised. Improving the distribution and availability of works for persons with disabilities, particularly the visually impaired, is another cornerstone of the Communication.

On adoption, Commissioners McCreevy and Reding stressed that the debate over the Google Books Settlement in the United States once again has shown that Europe could not afford to be left behind on the digital frontier.

“We must boost Europe as a centre of creativity and innovation. The vast heritage in Europe’s libraries cannot be left to languish but must be made accessible to our citizens”, Commissioner McCreevy, responsible for the Internal Market, stated .

Commissioner Reding, in charge of Information Society and Media, said: ” Important digitisation efforts have already started all around the globe. Europe should seize this opportunity to take the lead, and to ensure that books digitisation takes place on the basis of European copyright law, and in full respect of Europe’s cultural diversity. Europe, with its rich cultural heritage, has most to offer and most to win from books digitisation. If we act swiftly, pro-competitive European solutions on books digitisation may well be sooner operational than the solutions presently envisaged under the Google Books Settlement in the United States.”

The announcement goes on to discuss three main issues:

+ Digital Preservation and Dissemination

+ Orphan Works

The digitisation and dissemination of orphan works pose a particular cultural and economic challenge – the absence of a known rightholder means that users are unable to obtain the required authorisation, e.g. a book cannot be digitised. Orphan works represent a substantial part of the collections of Europe’s cultural institutions (e.g., the British Library estimates that 40 percent of its copyrighted collections are orphan ). The Commission will now examine this phenomenon more in detail via an impact assessment.

+ Access for Persons with Disabilities

Much More in the Complete Announcement

Source: EUROPA

See Also: Summary: Commission Communication on Copyright in the Knowledge Economy (1 page; PDF)

See Also: Communication from the Commission: Copyright in the Knowledge Economy, October 19, 2009 (10 pages; PDF)

The Daily Princetonian (Princeton U.) Editorial: Going Beyond Google Books

Monday, October 19th, 2009

From the Editorial:

In 2007, Princeton signed on as one of the partner libraries in the Google Books Search project. By the end of this six-year agreement, the University will have sent Google about one million books to be scanned. All of the books — which the University has ensured are in the public domain — will then be available for free on the internet. This is an exciting project, allowing Princeton to share some of its intellectual wealth with readers around the world.

[Snip]

Though there is currently a lawsuit pending against the Google Books Search project by the Authors Guild and the Association of American Publishers, it does not pertain to Princeton’s participation in the project.

[Snip]

But this legal challenge is a reminder that Princeton’s involvement with Google — though a positive and useful partnership — could pose problems in the future. For one, Google, a for-profit corporation, may not be around forever, as it is subject to the intense competition of the technology sector. And though Google’s current goal is to digitize every book ever published, this may not always be the case. It is not unreasonable to imagine that in the future Google may develop a commercial interest in digitizing only works that would appeal to large audiences to make time for its workforce to focus on more profitable ventures.

[Snip]

One way to both reconcile this disparity between the profit motives of Google and the academic goals of Princeton, as well as to contribute to a more stable, long-term initiative, is for Princeton to join the HathiTrust. This promising nonprofit database started by Indiana University and the University of Michigan now includes 25 large university partners intent on creating a permanent database of digitized books not subject to the economic pressures corporations face.

[Snip]

The Google Books Search project has given our library a great head start into the sphere of digital libraries, and at no cost. But to protect the purely academic spirit of digitized libraries, the University should seek alternatives to its participation in the Google project.

Access the Complete Editorial

Source: The Daily Princetonian
Hat Tip: Library Stuff

Brewster Kahle, Co-Founder of the Internet Archive, Named a Visionary Who is Changing Our World

Monday, October 19th, 2009

Congrats Brewster!

The UTNE Reader is out with their list of “50 Visionaries Changing Our World” and Brewster Kahle from the Internet Archive and Open Content Alliance is on the list. You can Brewster’s entry here and review the entire list here.

His entry includes a link to a Slate article about Kahle from 2005 and a SF Chronicle article also from 2005.

The Internet Archive is not only home to the essential The Wayback Machine but it’s also home to over 1 million digitized books, thousands of hours of music*** and film. The Internet Archive also runs the Archive-IT program that works with numerous organizations to archive their specific content.

*** The Internet Archive is home to two music collections. The first contains live music and the other features a wide variety of audio including audiobooks and poetry.

The Book That Contains All Books

Sunday, October 18th, 2009

From a Column by Stephen Marche in the WSJ:

On Monday, the Kindle 2 will become the first e-reader available globally. The only other events as important to the history of the book are the birth of print and the shift from the scroll to bound pages. The e-reader, now widely available, will likely change our thinking and our being as profoundly as the two previous pre-digital manifestations of text.

[Snip]

The introduction of the printing press brought a similarly enormous change to the nature of reading. One of the most interesting figures in that transformation is the great Benedictine scholar Trithemius. He lived in Sponheim in the 15th century and managed to amass a library fully half the size of the Vatican library, an incredible achievement. He was also the author of “In Praise of Scribes,” the foremost defense of scribal practice, in favor of writing things out and against printing them.

[Snip]

But I am immensely excited for the new phase of the book. So far the new technology has been called the “e-reader,” a term obviously picked by engineers, not poets. In literary terms it’s a transbook, by which I mean that it is the book which can contain all books.

[Snip]

We are still in early days, but it is obvious where the transbook is headed: It will eventually provide access to all text that is non-copyright, and to the purchase of every book in or out of “print.” Kindle 2’s boast of being able to hold 1,500 titles will eventually sound as ludicrous as those early ads for floppy disks boasting that they could hold up to 64k of data. We will want everything and we will get it. Possibly there will eventually develop a subscription service, which provides access to all books for a monthly fee. At any rate, a single object will contain the contents of all the world’s libraries. It’s just a matter of when that will happen. And who will profit.

Much More in the Complete Column

Stephen Marche is the pop culture columnist at Esquire magazine.

Source: Wall Street Journal

Newspaper Digitisation News from the British Library: £33m Saves the World’s Greatest Newspaper Collection for the Nation

Saturday, October 17th, 2009

From the News Release:

The British Library has today received a commitment of £33m [that's nearly $54 million/U.S.] from the Government to preserve and make accessible the world’s greatest newspaper collection.

[Snip]

The British Library collects a copy of every local, regional and national newspaper published in the UK, plus 250 international titles. This unparalleled newspaper collection is an unique resource of over 750 million pages and is used for research by 30,000 people – genealogists, local historians and researchers from the creative industries – every year. The collection is used as source material for countless new books, newspapers, television programmes, films, documentaries, academic papers, local history projects and family trees in the UK every year, making a vital contribution to the UK economy.

However the collection is currently housed in dilapidated conditions in Colindale in North London where 15% of the collection is already beyond use and 19% is in peril. The £33m investment will allow the collection to be moved to a state of the art storage facility in Yorkshire while allowing digital and microfilm access to the collection from the British Library’s flagship building at St Pancras in London.

Dame Lynne Brindley, Chief Executive of the British Library, said, “We welcome the commitment to the £33m investment to preserve and make accessible the world’s greatest newspaper collection. This project will secure the collection’s future and benefit the whole nation. It has the full support of the newspaper industry.

[Our emphasis]“Our plans are already advanced with a number of key contractors already in place. We are ’shovel ready’ and this commitment will allow us to start building in 2010.”

Source: BL

Note: Don’t confuse today’s news with the British Newspapers: 1800-1900 (2 million pages) Collection from the BL, JISC, and Gale/Cengage Collection that went live in June.

See Also: Chronicling America (1880-1922) Newspaper Digitization Project from the Library of Congress and NEH
Over one million pages have been digitized so far.

See Also: Three Companies in the U.S. Digitizing Newspapers Are: NewspaperArchives.com, Google, and ProQuest (they also have several international databases in the U.K. and Canada)

What’s New in Digital Preservation

Thursday, October 15th, 2009

The team at the Digital Curation Centre in the UK are out with their latest “What’s New In Digital Preservation” compilation. The new edition covers the time period May – September 2009. It offers links to materials about digital preservation from a large number of global sources. If you’re interested in this interesting and important topic, this newsletter is a must.

Video and Slides Available from OCR for the Mass Digitisation of Textual Materials Workshop

Thursday, October 15th, 2009

From a Blog Post:

A workshop was held at the University of Bath on 24th September 2009, looking at some of the current issues in using Optical Character Recognition for digitisation, organised in the context of the EU Impact project. Videos, slideshows, notes and questions from the day are now all available from the workshop webpages.

Access the Main Workshop Web Page

Summary notes are available for all sessions.

Sessions Included:

+ OCR Workshop (Video and Slides)
+ Digitisation Overview (Video and Slides)
+ Introduction to OCR (Video)
+ Document Image Analysis for Text Recognition (Slides)
+ Improving and adding value to OCR results – the IMPACT project (Slides)
+ Case Study – British Library/JISC Newspaper project (Video and Slides)
+ Case Study – Targeted Language Resources for the Digitisation of Historical Collections (Video and Slides)
+ Case Study – Using digitised text collections in research and learning (Slides)
+ Panel Session (Q&A, Text Available)

Source: JISC Digitisation