Archive for the ‘Search Tools’ Category
Thursday, May 1st, 2008
SpotSigs: Robust and Efficient Near Duplicate Detection in Large Web Collections
8 pages; PDF.
From the abstract:
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching signatures for near duplicate detection in large Web crawls. Our spot signatures are designed to favor natural language portions of Web pages over advertisements and navigational bars.
The contributions of SpotSigs are twofold: 1) by combining stopword antecedents with short chains of adjacent content terms, we create robust document signatures with a natural ability to filter out noisy components of Web pages that would otherwise distract pure n-gram-based approaches such as Shingling; 2) we provide an exact and efficient self- tuning matching algorithm that exploits a novel combination of collection partitioning and inverted index pruning for high-dimensional similarity search. Experiments confirm a increase in combined precision and recall of more than 24 percent over state-of-the-art approaches such as Shingling or I-Match and up to a factor of 3 faster execution times than Locality Sensitive Hashing (LSH), over a demonstrative Gold Set” of manually assessed near-duplicate news articles as well as the TREC WT10g Web collection.
Source: Stanford InfoLab
Posted in Information Science, News Search, Search News, Technology and Internet | No Comments »
Thursday, April 17th, 2008
Barry Schwartz writes:
Live Search News takes a more linear view of news, when you compare it to the Yahoo News home pages. Live Search News looks more like a Techmeme style news approach, but it obviously uses a different algorithm.
Direct to Live Search News
Source: Search Engine Lande
See Also:
Two More Excellent News Resources:
1) NewsNow
2) Topix
Posted in News Search, Search Tools, Source File | No Comments »
Wednesday, April 16th, 2008
From a blog post overview:
You can now enter an annual salary in the keyword search box to find all jobs we estimate pay at least that much. To find marketing manager positions paying over $60,000 per year, for example, search Marketing Manager $60,000.
Source: Indeed.com
Posted in Business and Economics, Search Tools, Source File | No Comments »
Friday, March 28th, 2008
A online news search pioneer releases some new technology. We’re going to give it a whirl.
From the announcement:
Rocketnews.com goes further, working with news seekers to bring them what they are looking for by creating easy to configure, user-defined feeds from a database of over 60,000 sources, and growing…Rocketnews.com introduces the Topic Discovery Engine, which expands a contextual search to include blog posts, photos, video clips and research data, besides an abundance of updated and historical news. The Topic Discovery Engine examines all 60,000 news sources; it collects, analyzes and categorizes news stories; and then updates category pages, topic pages and related RSS feeds. Topic pages, a new feature at Rocketnews.com, highlight popular news topics by displaying related news stories, blog posts, photos and noteworthy quotes.
Source: News Release
Posted in News Search, Search Tools, Source File | No Comments »
Sunday, March 23rd, 2008
eufeeds: over 300 newspapers updated every 20 minutes
From RSS4Lib:
EUFeeds is a special-purpose RSS aggregator for European newspapers that provides access to more than 300 papers from the European Union. Provided by the European Journalism Centre in the Netherlands, this site lets you quickly browse the print media from each EU member nation.
The site defaults to UK newspapers; there is no apparent way to set a different country as your default entry page. It also does not provide an RSS feed for the aggregated content — so you cannot subscribe to the aggregated Czech Republic news, only visit it on a web page.
Posted in News Search, RSS, Source File | No Comments »
Saturday, March 22nd, 2008
The folks at Melissa Data have just placed a new email location database online at no charge. After entering the email address, the database will tell you where the mail server is located. Of course, this does not guarantee that the sender is located in the same place. For example, the mail server might be located in the UK but the sender is in the U.S.
Direct to Email Lookup Database Interface
Displays the city, state, country & a map of an email address.
Review All Melissa Data Lookup Databases
Source: Melissa Data
Posted in Databases, Directories, and Guides, People Search, Search Tools, Source File | No Comments »
Wednesday, March 19th, 2008
SearchMedica Offers Medical Professionals Six New Specialized Clinical Web Searches
From the news release
SearchMedica adds cardiovascular, diabetes/endocrine, infectious disease, musculoskeletal, pediatric, and respiratory disease categories to cancer/hemic, mental/nervous system and general medicine.
Direct to SearchMedica
Posted in Databases, Directories, and Guides, Science, Search Tools, Source File | No Comments »
Monday, March 17th, 2008
Chronicling America Newspaper Site Adds More Pages, Features
From the announcement:
More than 79,000 newly digitized newspaper pages, along with several new site features, have recently been added to the Chronicling America Web site at www.loc.gov/chroniclingamerica/. With this update, the site now provides access to more than 500,000 digitized newspaper pages, dating primarily from 1900 to 1910, and representing 61 newspapers from California, the District of Columbia, Florida, Kentucky, New York, Utah and Virginia. Chronicling America is a project of the National Digital Newspaper Program (NDNP), which is a partnership between the Library of Congress and the National Endowment for the Humanities (NEH).
New features in Chronicling America include:
+ “See All Available Newspapers” page - A list of all newspapers with pages available on the site.
+ RSS feed and E-mail Update service - Users can subscribe to Real Simple Syndication (RSS) updates or e-mail delivery at www.loc.gov/rss/ (see list under Topics/Newspapers and Journalism). Updates will include notices of added content and other points of interest.
Make sure to see the news release with links to a few highlights from the database.
Source: LC
Posted in Databases, Directories, and Guides, Digitization Projects, History, Libraries and Librarianship, News Search, Search News | No Comments »
Friday, March 14th, 2008
From the announcement:
CrossRef, the multi-publisher linking platform, announced today that Mekentosj, creator of Papers, had signed on as a CrossRef affiliate in order to integrate DOIs and CrossRef metadata into its services. Papers is an award-winning application for researchers that improves their Mac-based workflow for searching, downloading, and managing PDF articles.
Papers already uses the DOI as a standard way to identify and lookup scientific articles. With the new partnership, Papers will add a tighter integration with Crossref’s OpenURL service to facilitate the discovery of both new and existing scientific publications. As a result of the CrossRef integration, Papers can recognize the DOI in PDF files and on web-pages, and automatically retrieve the available bibliographic information, including title, authors and journal names, from Crossref’s metadata database. With one click, this information is then added to the researcher’s personal library, making scientific articles more accessible and manageable.
Source: CrossRef
Posted in Information Industry, Search News, Search Tools, Software and Web-Based Applications | No Comments »
Thursday, January 31st, 2008
A full review will be coming soon on ResourceShelf.
What is it?
1) Available for IE only!
2)
The gClick™ button allows readers to dynamically extract real-time comprehensive intelligence — on companies, executives, and events — from any Web page with the click of button. Within seconds, you can go from scanning an article, anywhere on the web, to viewing in-depth information about the companies and executives referenced in the article.
3) gClick gathers real-time, contextual business intelligence from any story or HTML page by clicking on the button or using imbedded links.
More here. The technology comes from a company named Generate Inc. American City Business Journals became a “Strategic Investor” in Generate Inc. in 2005.
Here are two screen caps of gClick in action using a WSJ story. It works with all content, not only American City Biz Journals material.
1 (the story itself) ||| 2 (clicking on a company mentioned in the story)
Worth a look and more coming from RS in the future about gClick. It’s a free app, btw. We also hope a Firefox version is also in the works.
Posted in Business and Economics, Search Tools, Software and Web-Based Applications, Source File | No Comments »
Saturday, January 12th, 2008
New Health Topic Resources from MedlinePlus: Diabetes Complications
Source: MedlinePlus
Posted in Search Tools, Source File | No Comments »