Archive for February, 2003

Oregon: Bill proposes library porn filters

Friday, February 28th, 2003

Public Libraries–Filtering
Source: The Oregonian
Oregon: “Bill proposes library porn filters”
From the article, “Libraries would be required to block inappropriate Internet sites from young patrons under a bill introduced in the Oregon Senate. Sen. Charles Starr, R-Hillsboro, said this week he introduced Senate Bill 656 because constituents expressed concern about children viewing online pornography at public libraries. “We’re a lot more familiar with the Internet and what’s out there than we were a year ago or five years ago,” Starr said. “As this public knowledge and awareness increases, there are more concerns about what we’re exposing our children to. Internet filters have been a hot issue for the past year in Oregon and Southwest Washington. Librarians from Multnomah County and Vancouver led two lawsuits challenging the federal Children’s Internet Protection Act, or CIPA, which would pull federal money from libraries that don’t filter access at all Internet terminals. A federal court in Philadelphia ruled in the librarians’ favor last year, and the U.S. government appealed. The Supreme Court plans to hear arguments Wednesday. State governments aren’t waiting to hear from the high court. Oregon probably will join other states debating bills that are somewhat similar to the protection act.”
See Also: Read the Full-Text of Oregon Senate Bill 656

90390409

Friday, February 28th, 2003

Professional Reading Shelf
Libraries
Source: Council on Library and Information Resources
The March/April Issue of CLIR Issues is Now Online
Includes an excerpt from Deanna Marcum’s paper, “Realizing the Potential of Digital Libraries”

90388814

Friday, February 28th, 2003

Resources, Reports, Tools, and Full-Text Documents (4 Items)
Wealth–Lists & Rankings
Source: Forbes
Now Available, The World’s Richest People Rankings
The 17th annual list is online. Searchable. Lists by region and a “power ranking” that “weighs political connections, business scope, media coverage, philanthropy and net worth”.

Digitization Projects–Canada
Now Available: The Digitized Diaries of William Mackenzie King
Last May we mentioned that a project was underway to digitize this material. Today, news from Cold North Wind that the project is complete. From the announcement, “The diaries, which King kept on a regular basis from 1893 while as a student at the University of Toronto until shortly before his death in 1950, provide an intimate glimpse into one of Canada’s most unique and dominant political figures. The diaries are available on the National Archives website as part of a retrospective on King’s life. The diaries are searchable by term or phrase as well as by date and page number. The diaries exist in three different forms�the original handwritten diary; a typed transcript and an abridged typed transcript. In total there are about 50,000 images.”
See Also: Direct to the The Diary Home Page (via National Archives of Canada)

Health Information
New MEDLINEplus Compilations
* Child Mental Health
* Cesarean Section
Also Available: Directory of “Access to Electronic Health Information” Projects Funded by NLM in 2003

Updates from Two Very Useful Content Acquisition Resources are Available
* Internet Resources Newsletter (March, 2003)
* BUBL Link 5/15 (February, 2003)

90383516

Thursday, February 27th, 2003

Web Resources of the Week
1) Collapsing Results With Web Search Engines
A couple of weeks ago, Bill Dedman, a Pulitzer Prize winning journalist who also compiles PowerReporting.Com reminded me about the presentation of results on just about all general web engines. Nothing new here but it’s often forgotten by many searchers, including me. Before we get started I need to thank my friend and colleague, Greg Notess for some additional facts about site collapsing that were incorporated into the article.

The Issue
Do you realize that in many cases you are only seeing two results from any one web site when you do a web search? Why? Result page “clustering” or “collapsing” is done to help reduce visible duplicates but it could also cause you to miss useful material.

Examples
Let’s do a Google search for “National Library of Canada” and “British Library”. You’ll notice that the third and fourth results come from the Library of Congress site (Loc.Gov). The second result from particular site is always intented. However, the Loc.Gov site has many more pages that might be of interest. Because of site collapsing/clustering Google only shows two results from any one web site. YOU MUST click the “see more results” link to view all of the hits from the LC.Gov site that contain your search terms. When you do, you’ll find 139 more hits after Google creates and runs a site restricted search. To “turn off” the site collapsing feature with Google add &filter=0 to the end of a Google search URL. If results are less than 800 or so, you can go to the last page of the results and click on “repeat the search with the omitted results included” link.

Other Engines
* AllTheWeb also collapses results (default) but offers you the option to turn this function off. For AllTheWeb click, “Customize Preferences”, Advanced Settings, and “Site Collapsing” from the home page. AlltheWeb will include clustered results later on in the results lists, unlike Google. In other words, pages that have been clustered will show up later in the relevance-ranked position, at least sometimes. However, many people only look at the first few, very few, results.
* AltaVista and Teoma also collapse results. To turn the collapse function off with AV use the check box on the Advanced Search page. Teoma offers no option but does offer a “see more results from” link below the second hit.
* By default, MSN Search shows all results but you can limit to only one result per site (with no link to view all material) by selecting the box on the Advanced Page. Bottom Line: Awareness of this issue and to use the “see more results” link to view all of the content from a specific web sites.
——
DON’T FORGET TO VISIT RESOURCESHELF’s CURRENT POSTINGS
——
2) AltaVista News, Searching Beyond Thirty Days
I’ve mentioned on several occasions that news search from AltaVista continues to develop into a favorite. Have you noticed that the AV News now offers an option to limit by date or date range? Although, I don’t recommend limiting by date for general web searching it can be very useful with news since every article has a specific publication date associated with it. One of the date limits at AV is “anytime”. Another limit allows you to search by using a range of dates. What does Anytime mean? While many news engines contain only about 2-4 weeks of news AltaVista’s archive goes back well beyond 30 days. This doesn’t mean it’s time start canceling your fee-based services. If older content is available, it’s because the various news organizations are keeping the links active. AV checks the urls regularly to see if links are still “hot”. In a time of declining budgets, we might as well maximize what free and low cost content is still available.
More Specifics
Andreas Hartmann from AV tells Resourceshelf, “The archive contains approximately 4 million URLs (of fully indexed articles from a variety of sources) which are older than 30 days. URLs are checked every 2-4 weeks for 404s or other issues.” Content comes from several sources including a Moreover feed. Additionally, AV is now crawling selected news sites on their own. Finally, you can use all of AV’s advanced syntax with news search. This includes the proximity operators NEAR and Within. Search Engine Showdown has a complete list of the AV syntax

90383407

Thursday, February 27th, 2003

Web Search
Source: The Washington Post
Cherchez The Search Advantage
A couple of comments about Leslie Walker’s article. She writes, “Another kind of search revenue also is growing fast. Called “paid inclusion,” it allows advertisers to pay to make sure their Web pages are included in the automated programs that crawl the Web’s links and index its pages. The Web has grown so big that search engines have difficulty scanning all of it. So most of them — Google is the main exception — let companies pay to ensure they are included in what is scanned. Sponsored results, by contrast, do more than ensure a listing, they guarantee greater visibility through higher placement in the results.”
-
O.K. let’s review. Paid inclusion has been around for a couple of years. Inktomi got the ball rolling in March of 2001. However, paid inclusion DOES NOT mean that content that is NOT PAID for isn’t in the index. Most of the content in AllTheWeb, Teoma, MSN Search, and AltaVista is not paid inclusion. The companies that do pay are guaranteeing that the crawler indexes and reindexes the material quickly. In most cases, organizations pay for paid inclusion per url. In the past few months were seeing more and more rapid recrawling of all material paid inclusion or not by all of the engines. Is it where it needs to be, not at all. But it’s getting better. Walker also makes “sponsored results” seem a bit decevious. This is not true. In this case, results are clearly labeled sponsored (the labeling is much better than in the past) and appear either at the top of the result listings or in the right-hand column. These sponsored listings do NOT influence the actual search results. Finally, the size of the web does continue to grow but this is not the only reason why material might not be indexed. Furthermore, even if it WERE in one place, how easy would it be to access ANY of it without the proper tools to get it out of the database and utilize the data. In other words, specialized engines designed around a specific data set will still be useful. The challenge (and it’s a big one, is knowing what’s available and being able to get to it quickly. Great information professionals have always understood the tools they had to work with. No different here.

390383291

Thursday, February 27th, 2003

Resources, Reports, Tools, and Full-Text Documents (3 Items)
World Trade Center
Source: Lower Manhattan Development Council
Selected Design for the World Trade Center Site
Includes slide presentation, background reports, and more.

Philanthropy–United States–Statistics
Source: Foundation Center
New, Highlights: Foundation Giving Trends, 2001

Energy–Iraq
Source: Energy Information Administration
Update, The EIA Has Updated Their Iraq Country Analysis Brief
Data,maps, and links about Iraq’s energy sector.

390372693

Wednesday, February 26th, 2003

Public Libraries–United States
Time for Another Library Budget Crisis Update

Here are a few selected articles with recently published stories from around the country.
Georgia
“Libraries cinching belts during budget crunch”
-
Hawaii
“Lawmakers fret over libraries”
-
Montana
” Libraries across the state are taking a big hit”

-
New York
“Pataki’s budget plan a threat to libraries”
“Libraries battle proposed funding cuts”
New York City
“Librarians Fight Cuts”
-
North Carolina
“Budget woes may cause library cuts”
-
Oregon
“Salem Public Library budget is on chopping block”
-
South Carolina
“Budget cuts whittle library selections”

Texas
Reader J.F. writes to ResourceShelf about budget cuts in Texas, “These are very dark times for the academic and public libraries in Texas. Although we haven’t heard (and won’t for some time) about what the final impact the budget cuts will be on TexShare, the news so far is very bad. Just to give you one example: the already announced cuts in support for higher education in Texas means that the budget for the databases funded by the DCCCD libraries will be cut by 80%. Translated into dollar terms, our database budget has been reduced from $91,000 to $18,000. The clock has been turned back ten years. In the fall of 2003, the DCCCD libraries will offer fewer online databases than it did in 1993.”

90380486

Wednesday, February 26th, 2003

Public Libraries–United Kingdom
Learn About: learndirect
From the announcement, “The People’s Network, the project to connect all public libraries to the internet, and learndirect, a scheme to provide learning online and information services, have jointly launched an information initiative for public libraries in England. The new facility available on the People’s Network website enables all public libraries to introduce their users to learndirect and the many online learning opportunities it provides.”
See Also: Direct to the learndirect Web Site

90377930

Wednesday, February 26th, 2003

State Library–Florida
Source: Tallahassee Democrat
A New Plan for The State Library of Florida
From the article, “A deal announced Tuesday between Gov. Jeb Bush and Nova Southeastern University would give the private college Florida’s $10 million State Library circulating collection – along with $5 million to move and maintain it. The announcement was a surprise to many State Library supporters, made just weeks after the state refused to come up with any funds to help Florida State University, a public institution a couple of blocks away, take over the collection. Nova is located in Fort Lauderdale.”

Google Awarded Patent

Wednesday, February 26th, 2003

Web Search–Google
A Couple of Google Briefs
Web Search
1) Where in the World is Steve Lawrence?
Yesterday, I mentioned that Gary Flake, formerly of NEC Research, was now chief scientist at Overture working to bring their newly acquired search technologies together. We can now confirm a former colleague, co-author, and another big name in information retrieval, Steve Lawrence, is working at another web search company, Google.

2) Be Careful If You Use Google as a Verb
From the Detroit News, “When your search engine becomes ubiquitous — and it has a cool name — this is bound to happen: The American Dialect Society has voted the term “to google” as the most useful word of 2002. Naturally, in this age of lawyers, patents and nearly perpetual copyrights, the folks at Google — the search engine — have sent letters and orders directing that ‘googling’ — the newly-minted verb — is off limits.” I wonder if Kleenex, Jello, or Xerox will come after me. (-: Actually, it’s of sad that Google IS web search and research for many people but it once again reflects the poor marketing in the age of the Internet that traditional research (libraries and information vendors) have done. You can read the full-text of Google’s letter here.
See Also: Barbara Quint’s 2002 Article, “”Google: (v.)…” . This article was published about one year ago.

3) Google Awarded Patent
Awarded yesterday, “Ranking search results by reranking the results based on local inter-connectivity” The inventor listed on the patent is Krishna Bahrat. Mr. Bahrat is also the primary developer of Google News. Stanford University was assigned the U.S. Patent for PageRank.

New Site, Columbia Accident Investigation Board

Wednesday, February 26th, 2003

Resources, Reports, Tools, and Full-Text Documents (4 Items)
Iraq
Source: CIA
Full-Text Report, Putting Noncombatants at Risk: Saddam’s Use of “Human Shields”

Patents–United States–Statistics
Source: USPTO
Top 10 Universities Receiving Most Patents in 2002
Thanks to S.C. for the news tip.

Space Shuttle
New Site, Columbia Accident Investigation Board
Thanks to S.A. for the tip.

Parliament–Canada
Source: Library of Parliament
New Site, Library of Parliament: Background Resources for Educators

divine Files For Bankruptcy

Tuesday, February 25th, 2003

Information Industry–divine
divine Files For Bankruptcy Protection
Yes, it’s true but not a big surprise. This time we can’t say the best laid plans go to waste since no one really new what divine’s plans were. Here’s the press release, the SEC filing from EDGAR, and a bit more from the Chicago Tribune and Crain’s Chicago Business. From the article, “GTCR [a Chicago area venture cap firm] principal Phil Canfield said today that his firm, whose founder Bruce Rauner has long-time ties to Divine�s CEO Andrew �Flip� Filipowski, has signed a letter of intent to buy substantially all of Divine�s assets for about $50 million.”…The sale would need to be approved by the Bankruptcy Court. “The company was established to ride a wave that had passed,� said Chicago entrepreneur Josh Schneider. �As hard as everybody fought, it was too little too late. It�s like one person steering a paddle boat in a tidal wave.�

AllTheWeb Acquired By Overture

Tuesday, February 25th, 2003

Web Search–AllTheWeb
Another Week, Another Search Engine: AllTheWeb Acquired By Overture
One week after Overture acquired AltaVista, the comany has announced another acquisition, this time AllTheWeb from FAST Search and Transfer. Overture, a pay-for-performance search company, will acquire FAST’s Internet business unit for $70 million in cash, as well as performance-based cash incentive payments for up to $30 million over three years. From the announcement, “Under the terms of the agreement, Overture will acquire FAST’s Internet business unit assets including FAST Web Search�, AlltheWeb.com�, and FAST PartnerSite� products, related intellectual property rights, as well as data centers and equipment in Sacramento (USA) and London (UK). In addition, the FAST Internet business unit personnel will transfer to Overture.” What Overture will do with the AllTheWeb database is to be determined. In just 14 days, we’ve seen the two of the best web engines, in terms of “search power” acquired by Overture. You can see a few examples of the power in last week’s AV post. FAST will now focus and develop its Data Search product. It accounts for over 75% of FAST’s current revenue.

More details to follow. Only time will tell what this means for all of us. Speculation will once again run rampant but much, if not all of it, will be on what this means for web advertisers.

Good News: Gary Flake was named by Overture to bring together the technologies of Overture, AltaVista, and AllTheWeb. This is good news for searchers. Why? Dr. Flake previously worked at NEC Research. Some of the most useful specialized web search tools including Research Index and metasearch tools called Inquirus and Inquirus2 were developed (all three remain online but direct links to Inquirus are not allowed) there.
A few of Gary Flake’s publications:
“Using Web Structure for Classifying and Describing Web Pages”, 2002
“Improving Category Specific Web Search by Learning Query Modifications”, 2001
“Efficient Identification of Web Communities”, 2000
“DEADLINER: Building a New Niche Search Engine”, 2000
Self-Organization and Identification of Web Communities, 2002
Finally, it’s also important to mention that AltaVista has another highly respected web search/info retrieval researcher/scientist/developer on its team, Jan Pedersen.

A quick review of the web search world from a researcher’s perspective: Google (its own database, its own technology), Teoma/Jeeves (its own database, its own technology), Inktomi (now part of Yahoo, operates MSN Search and an “Inktomi installation on Hotbot), AltaVista (now part of Overture, as of last week they told me the company would continue to operate as a separate entity). Search Wars 2003 are here. The more I think about it, all of this consolidation COULD good be a good thing for those of us who use the web as a research tool. Stay tuned and buckle up.

390379759

Tuesday, February 25th, 2003

Library Associations
IFLA Annual Conference Gets a New Name
“The biggest annual international gathering of librarians, the IFLA [International Federation of Library Associations] Conference, is to be known in future as the “World Library and Information Congress”. IFLA�s Governing Board decided on the new title last year. It was due to come into effect with the 2004 event to be held in Buenos Aires, Argentina. However, it has now been decided to adopt it this year- for the meeting due to take place in Berlin, Germany, 1-9 August 2003. The new title is designed to ensure that the event has a greater impact outside the profession and in the city in which it takes place.”

90372185

Tuesday, February 25th, 2003

Professional Reading Shelf
OCLC
Now Available: OCLC Annual Report, 2001-2002

Word scans indicate new ways of searching the Web

Monday, February 24th, 2003

Online Searching
NOTE: After You Read the Background Article, Visit This New Page from Daypop, Where You Can See Word Bursts From Weblogs.

“Word scans indicate new ways of searching the Web”
When Dr. John Kleinberg of Cornell Univesity talks/writes about searching, it’s more than worth listening. In a recent presentation to the American Association for the Advancement of Science, Dr. Kleinberg discussed research on how “burstiness” might be a new tool for incorporation into relevancy measures. From the announcement, “[Kleinberg] devised a search algorithm that looks for “burstiness,” measuring not just the number of times words appear, but the rate of increase in those numbers over time. Programs based on his algorithm can scan text that varies with time and flag the most “bursty” words. “The method is motivated by probability models used to analyze the behavior of communication networks, where burstiness occurs in the traffic due to congestion and hot spots,” he explains…For searching the Web, Kleinberg suggests, such a technique could help zero in on what a searcher wants by recognizing the time context of such material as news stories. For instance, he says, a person searching for the word “sniper” today is likely to be looking for information about the recent attacks around the nation’s capital — but the same search nearly four decades ago might have come from someone interested in the Kennedy assassination.” Dr. Kleinberg was a researcher on IBM’s often discussed but never publicly released Clever project. Many of the underlying concepts from Clever are being utlilized by Teoma.
See Also: “Word ‘Bursts’ Could Help Refine Web Searches” (via Scientific American 2.19.03)
See Also: “Hypersearching the Web” (via Scientific American 6.1999)
Co-authored by Kleinberg, this is one of the better papers on web search aimed at a non-technical audience I’ve read. Again, many of the concepts discussed re: Clever are being used at Teoma.
See Also: “Clever New Way to Search?” (via Wired 11.27.98)

Funding for Australian Databases at Drying Up

Monday, February 24th, 2003

Databases–Australia
Source: The Age
The data drought”
“Water is not the only resource that is drying up. In the midst of the drought, public access to information is also at risk as funding for some of Australia’s most precious databases slows to a trickle.”

90365193

Monday, February 24th, 2003

Resources, Reports, Tools, and Full-Text Documents
Education–Demographics–United States
Source: NCES
Census 2000 Data Now Mapped to School District Boundaries
“Census 2000 school district demographics are now available by school district boundaries. View demographic data about children and their living environment, by school district, from Census 2000 data from PL1-SF1 and SF3 data sets.”

More on the Google Purchase of Blogger

Monday, February 24th, 2003

Web Search–Google
Source: The NY Times
More on the Google Purchase of Blogger
A week after everyone else makes a comment or two, The NY Times gets in on the act. Like I said last week, Google (like any other weblog) can have near instant access to just about any weblog (if the owner of the blog want’s to make it available) via a site like weblogs.com. Anytime a weblog publisher hits the publish button (like I do with ResourceShelf), it sends an update “ping” to weblogs.com. So, Google or any other engine can use weblogs.com as a tool to tell their crawlers when to recrawl and reindex a site. The NY Times article mentions that other weblog software is out there. Google would be making a mistake if they choose to only reindex Blogger content on an constant basis. Finally, the article makes no mention of Daypop, a search engine that’s been indexing blog content for a couple of years.
See Also: My Comments Re: Why Blogger & Google From 2.16.02

In Florida, “Bush, librarians reach tentative deal”

Sunday, February 23rd, 2003

State Libraries–Florida
Source: Florida Today
Florida: “Bush, librarians reach tentative deal”
“The State Library, which might face dismantling to save money, could get a reprieve. Gov. Jeb Bush’s administration planned to do away with it to save about $5.4 million a year by transferring much of the library possibly to a university. After protests from librarians and historians, the administration is now working on an agreement that would keep the archives, the library’s historic collection and the state museum in state government hands.” But former State Librarian Barratt Wilkins warned that this might only be a temporary agreement. He told about 65 professional librarians and historians at the Cocoa Civic Center on Wednesday that if the governor dismantles the State Library, it would affect every library in the state because federal and state funding flows through the state library.