OpenCalais (OC) is a free service that we first mentioned six months ago and have mentioned several times since. This post from June, 2009 mentions some of the organizations using the service from Thomson Reuters.
In a nutshell, OpenCalais uses semantic technology and natural language processing to analyze text and add metadata by drawing out entities from documents, blog posts, news stories, etc. In some cases, ths type of data can identify or help identify relationships between people, businesses, etc.
A visualization tool might make OpenCalais even more powerful. For example in might be interesting for visualization tools like Muckety or NNDB Mapper and to quickly “see” relationships that might go unnoticed without OpenCalais or other services.
Sure, it would be wonderful if all web content could be analyzed by a human and then have high quality metadata associated with it.
However, that’s far from possible given the massive amount of content generated each minute of each day.
You can try OpenCalais yourself by typing or pasting text into the viewer box.
We entered the full text of last Saturday’s Weekly Presidential Address and got back lots of stats and commentary.
+ Topic (Labor) Worth noting that we did not put the title of the address in the viewer box. The title is, “President Obama Says Recovery Act Creating Jobs and Strengthening Economy”
+ Social Tags Labor, Unemployment, Presidency of Barack Obama, etc.
+ Entities including: Cities (Arcadia, FL) is mentioned in speech Holiday (He ends by wishing everyone a Happy Halloween
+ Continent America (well we’ll got it some slack, close but incorrect) +Industry Terms Clean energy, Less Energy (Good)
+ Province or State Florida, again accurate Finally, Events & Facts
+ Generic Relations (announce, Florida,United States, the largest set of) First we were puzzled. Then, by cursoring over the entry, we see that it’s Florida having the largest set of clean energy projects.
Btw, if you cursor over any of the entities you’ll find additional info.
For example, with Florida we find a relevance score and the lat/long for Arcardia, FL, the town mentioned in the address.
Although we did see it in our document, OC might also provide direct links to Wikipedia, CIA World Factbook, etc.
Overall, very good. But, it’s just one example and one example search does not make a service.
One question that we would we would like to get an answer to is why ThomsonReuters is providing free access to OpenCalais? Does it plan to charge for additional services in the future?
UPDATE: Krista Thomas from OpenCalais sent along the following goals in a Twitter message.
1) Better software faster.
2) Connect all the worlds’ business information.
For bloggers, OC offers a WordPress plug-in, a service for Drupal users and more. The WordPress tool analyzes blog postings, suggests, and even images from Flickr.
Other services have technology that draws out indexing terms, descriptors, etc. but OpenCalais appears to be much more sophisticated. Somewhat similar is Silobreaker news search Silobreaker’s algorithm draws out entities from stories and then make them clickable or searchable. It also offers a couple of cool visualization tools.
Krista Thomas from OpenCalais recently gave a presentation to the San Diego Software Industry Council. Krista’s slides are available online. The charts on pages 4 and 5 are difficult to read so we’re trying to get copies to share..
At the present time
+ 18,000 Developers
+ 20+ Publishers
+ 50 Apps and Services Created
+ 4 million docs processed daily
Again, you can try OpenCalais yourself by typing or pasting text into the viewer box.
Finally, here’s one more OC example using the content from this post.
Overall, it’s easy to see how this service could be of value to both the individual blogger but even more so to publishing companies with a non-stop stream of of content.