OCLC Opens Up the Entire WorldCat Database to Web Engines and Other Partners

OCLC Opens Up the Complete WorldCat Database to Web Engines and Other Partners
By Gary Price and Steven M. Cohen
Today, OCLC announced that they’re expanding the Open WorldCat program and offering the entire WorldCat database to Google, Yahoo, and other partners to crawl.

Barbara Quint has all of the details in this ITI NewsBreak. It’s a must read!

Today’s announcement is good news. Potentially, more books will get circulated since more WorldCat items will be available via Google, Yahoo, and other partners. At the same time, let’s hope that we can also use this as a marketing tool for all libraries.

Will Google and Yahoo want all of this material? Let’s hope so. Fifty million records is a lot of content. The announcement doesn’t indicate if Google and Yahoo have said they will crawl all of the records and how long it will take to get the material into the database.

With a commitment to crawl, some of the material can start getting harvested in about a month. Then it will be up to the web engines as to how long it will take to get all of this material into their databases. Remember: As of today, the entire WorldCat database is not indexed in Google or Yahoo.

Here’s the official language:
“Harvesting by partners will occur gradually over a period of time.” Read the full text of the OCLC Open WorldCat Fact Sheet.

What type of arrangements will OCLC make with Google and Yahoo as to how often the data will be updated and new material harvested? We think new records would have to be entered at least once a week. OCLC could use RSS and provide feeds to Yahoo and Google (or other engines) so new content could be indexed on the fly. Feedster, for example, has been able to create a useful engine by indexing RSS feeds rather than html.

Over the weekend, we reviewed the Open WorldCat Fact Sheet and had a few thoughts. Consider these as constructive criticisms and ideas that might make the program even better.

Open WorldCat is here and growing. What can we do to make it better? Remember, books are for use!


+ Participation in Open WorldCat (making your library’s holdings visible) is only available if your library buys access to WorldCat on FirstSearch. This makes sense; OCLC is trying to protect itself from libraries taking but not giving back. In other words, participation in Open WorldCat is a member benefit. You can read more about it in this letter from OCLC CEO Jay Jordan to members.

+ The fact sheet talks about the increase in clicks (3.4 million last month) on the Open WorldCat records. Which is good news.

+ However, the number of actual visits to holdings info pages is about 8%, or 272,000. We wonder how many of those click-throughs came from librarians doing bibliographic verification? What we would really like to know is the total number of items circulated or even “influenced” after they were first discovered via Open WorldCat? Did people actually get the books? What was the end result? Compare this process to Amazon or, even better, to the ROI when people search local library catalogs. Let’s hope the new statistical tools OCLC plans to offer will answer these and other questions.

+ Will local libraries spend less time and money maintaining and upgrading their own catalogs and just tell patrons to go to Google or Yahoo? Do local OPACS still have search value for the typical patron? Is it worth the money to provide access to this service when all of the material is available on the web, accessible via an interface with which the public is familiar?

+ One thing that we would like to see (which would make Open WorldCat records even more useful to the public) is to make subject headings hyperlinked — e.g., visible and clickable. This would allow a searcher to get a list of all books with a particular subject heading.

+ OCLC must help improve subject access to material. We ran several subject-type searches (random topics) over the weekend (even including the word “library” in some of them) and got very poor results at both Yahoo and Google. Is one engine better than the other? Yes. It looks like you have a “better” chance with Yahoo. The searches we ran, along with the results, are accessible here.

+ Want to make Open WorldCat better and easier to use? How about OCLC working with Clusty, allowing them to crawl the material. Not only would this be a new outlet (more visibility), but the clustering would help with subject access. We’re going to mention this to Clusty’s CEO. Clusters could be based on subject headings and other parts of the record. This is just what Vivisimo does with ClusterMed (using MESH). We think working with Gurunet and others would also be a good idea.

+ What about RLG’s RedLightGreen site?
Since RLG also provides access to Google, RLG will be indirectly providing access to OCLC records. Will this cause any issues — especially in trying to market RedLightGreen?

+ Since OCLC is making all 52 million records available for Google and Yahoo, will they make the entire catalog available on the web at WorldCatLibraries.org? OCLC could make additional user services available, including the ability to create lists, save results, and build a bibliography.

+ The fact sheet makes no mention of what OCLC’s own people have said is a problem — where Open WorldCat records fall on search engine results pages. This is a key issue. Will people find (and use) the records if they’re on the second or third page of results?

+ Last week when I talked with a Google spokesperson about the new Google Print program, I asked (BQ did too!) if in addition to providing links to purchase the Google Print item, Google would also provide a direct link to the OCLC record. The response that we got was that they’ll have to look into it. Google said the same thing to LJ.

+ OCLC needs to develop a method to show holdings info only for those libraries a user has access to — unless his or her library will pay for an ILL request.

+ If we need to to teach the public to use a tool (bookmarklet) or a specialized interface to access Open WorldCat via Yahoo or Google, couldn’t we also show them how easy it is to reach individual OPACs via the web? When we do have a chance to teach users to access library records, should we show them Google/Yahoo (something they’re already familiar with) or the local library OPAC web site?

+ How about Yahoo adding a “library” shortcut? They do this for gas price databases and movie times. Actually, both Yahoo and Google recognize ISBNs if they’re entered directly into the search box. In addition to providing links to book merchants, couldn’t they also provide links to the Open WorldCat record? Ask Jeeves has been very big into creating “ready reference” answers. OCLC should also see if they’re interested.

+ Worth noting: With the new Google Print program underway, users in many cases will be offered access to full text books before they even have a chance to view a library record. This is another reason why Open WorldCat records need to be represented in the first few search results.

+ Since Google and Yahoo show only one or two results from any single domain, it’s going to be important to teach people how to turn the clustering off (clicking “more results from link…”) or to develop tools that do this and then share them with the public. In other words, a library might make numerous books about a topic available, but the user will only see one or two.

+ We found a problem with Open WorldCat records via Yahoo. Nothing major, but…
Example Search: library books about ben franklin.

An Open WorldCat result is available at number 6. A typical searcher will likely not use “library books” in his or her search, but #6 isn’t bad. Btw, if you remove the word “library” from the search, the #6 result drops to the 65th result.

OCLC WorldCat has other Ben Franklin books available, but Yahoo only shows one result per domain. So, I click “see more results.”. Will the typical searcher do this? Not sure, but let’s continue.

You’re now seeing the same record. Where are the others? Well, you’ll need to click one more time on the “repeat the search with the omitted results included” link.

Comments are closed.