Data sets tagged with "documents"

Corpus of Erotica Stories

Excellent resource for working with natural language processing and machine learning. This corpus consists of 4771 raw text erotica stories collected from www.textfiles.com/sex/EROTICA. A logical flow from the encouragement of writing on BBSes, people have been writing some form of erotica or sexual narrative for others for quite some time. With the advent of Fidonet ...
Free

EU - Susta Info

About Overview from [front page](http://magenta.collexis.net/susta-info/en/index.aspx): > Susta-Info is a global database of case studies and publications, validated by research institutes, ‘associations of cities’ and expert groups. Susta-Info is an EU DG Research supported project in the context of the Sixth Framework Program, with priority 1.1.6.3: Global ...
Offsite

EUROPA - Register of Commission documents

About Overview > The register contains references both of documents which have already been published and of internal (unpublished) Commission documents, from the 1st January 2001. Information in register includes: the identifier or reference number, the title of the document in the languages in which it is available, the date of the document, the languages in ...
Offsite

Wikisource

“Wikisource is an online library of free content publications collected and maintained by the community (see our inclusion policy).”
Offsite

20 Newsgroups Dataset (De-Duped Version)

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It is speculated that it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 Newsgroups collection has become a ...
Free

Stack Overflow Data Dump - Posts, Comments, Users, Votes & Badges

Stack Overflow Creative Commons Data Dump We decided early on that all user-generated content on Stack Overflow would be under a Creative Commons license. All those great Stack Overflow questions, answers, and comments, so generously contributed by all of you, are licensed under cc-wiki: You are free to Share — to copy, distribute, and transmit the work to Remix — ...
Offsite

Digging into Data - Various Repositories

A list of digital libraries, data archives, and data repositories that are inviting Digging into Data researchers to use their collections. For each repository, you’ll find a description of their contents, contact information, and other details.
Offsite

All Tags