Data sets tagged with "nlp"
Delicious bookmarks, September 2009
A record of all bookmarking activity on delicious.com for a roughly 10-day period in September 2009. Format is JSON, one record per line. There are 1.25 million entries. Download size is 170 MB. Sample record: {"updated": “Tue, 08 Sep 2009 08:45:00 +0000”, “links”: [{"href": “http://www.mcfc.co.uk/”, “type”: “text/html”, “rel”: "alternate"}], ...
Offsite
Offsite
Offsite
Offsite
Offsite
Big Huge Thesaurus API: Access 145,000 Words and Phrases
This site sports a very simple API for retrieving the synonyms for any word and also an actual Big Huge Thesaurus. License You may use the service for any legal and non-slimy purpose* so long as you link to this site in your website or application credits as follows: Thesaurus service provided by words.bighugelabs.com THE SERVICE IS PROVIDED “AS IS” WITHOUT WARRANTY ...
Offsite
Linguistic Data Consortium (LDC) - Collection of Linguistic Corpora and Datasets
The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC’s host institution. The LDC was founded in 1992 with a grant from the Advanced ...
Offsite
Ted Pedersen - Name Discrimination Data / Name Disambiguation Data / Name Ambiguity Data / Named Ent
Contains data where ambiguous entity names in text have been disambiguated. The data has either been manually disambiguated, or created by conflating multiple names into a single ambiguous pseudo-name.
Offsite
OpenCalais API
The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within ...
Offsite
Parse.ly 30 million news headlines and summaries from 500K web sources (30M entries, >20GB data)
Since 2009, Parse.ly’s crawlers have fetched nearly 30M articles from thousands of sources across the web. This InfoChimps exclusive dump provides web application developers, computational linguists, researchers, and other interested parties with ...
$350.00
