Data sets tagged with "words"
Twitter Census - Conversation Metrics: One year of URLs, Hashtags, Smileys usage (monthly)
Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data set consists of “tokens,” which are hashtags (#data), URLs, or emoticons (Twitter smileys or other “faces” created using keyboard characters). The data comes from analysis on the full set of tweets during that time period, which is 35 million ...
$300.00
Twitter Census - Conversation Metrics: One Year of URLs, Hashtags, Smileys Usage (by Hour)
Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data set consists of “tokens,” which are hashtags (#data), URLs, or emoticons (Twitter smileys or other “faces” created using keyboard characters). The data comes from analysis on the full set of tweets during that time period, which is 40 million ...
$1,000.00
Twitter Census - Conversation Metrics: One year of URLs, Hashtags, Smileys usage (Smiley Counts)
Twitter smiley data from millions of tweets! This is a free download of Twitter data from March 2006 to November 2009. The smiley data comes from analysis on the full set of tweets during that time period, which is 35 million users, over 500 ...
Free
Word List - 100,000 + Official Crossword Words (Excel readable)
A word list with over 100,000 entries that are officially permitted in crossword games like Scrabble™. This word list is available in a simple, alphabetically-ordered Excel format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom spelling dictionary. The entries include variants of ...
Free
Word List - 74,000+ Common English Dictionary Words (with Definitions, Excel format)
74,550 common dictionary words — A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.
$4.00
Word List - 10,000+ Common Place Names
U.S. place names for more than 10,000 entries. This U.S. place name list is available in a simple, alphabetically-ordered .txt format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom location tool or database. The entries represent a sampling of U.S. place names: 10,196 places in total.
Free
Word List - 100,000+ official crossword words (Excel readable)
113,809 official crosswords A list of words permitted in crossword games such as Scrabble™. Compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has all forms: -ing, -ed, -s, and so on of words, it makes a good addition when building a custom spelling dictionary.
Free
Wordnet
WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts ...
Offsite
A list of all 22,802 words in the Scribblenauts dictionary.
List of summonable objects from the Nintendo DS game Scribblenauts, from AARDVARK, ABOMINABLE SNOWMAN and ABSCONDER to ZOMBIE, ZUNICERATOPS and ZYGOTE. via the Scribblenauts Wikipedia entry: Scribblenauts is an emergent puzzle action video game with the tagline “Write Anything, Solve Everything”. Its objective is to complete puzzles by summonning any object (from a ...
Free
Word List - 350,000+ Simple English Words (Excel readable)
Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.
Free
MySpace User Activity Stream: Word count by day from December 2009-March 2010
This data is derived from the MySpace real-time stream API. The word count is from the free-form text fields MySpace moods, forum topic titles, replies to forum topics, text from sharing a link or item, and status mood updates. For the last three months the words from these fields have been extracted and this dataset contains their totals binned by day.
$25.00
MySpace User Activity Stream: Word count by hour from December 2009-March 2010
This data is derived from the MySpace real-time stream API. The word count is from the free-form text fields MySpace moods, forum topic titles, replies to forum topics, text from sharing a link or item, and status mood updates. For the last three months the words from these fields have been extracted and this dataset contains their totals binned by hour.
$50.00
AOL Search Data
The AOL Search Data is a collection of real query log data that is based on real users. The data set consists of 20M web queries collected from 650k users over three months. These private searches are perfect for research and mining. The data is sorted by anonymous user ID and sequentially arranged. The collection can be used for personalization, query reformulation or ...
Free
List of Dirty, Obscene, Banned and otherwise unacceptable words
A banned word list representing a collection of many lists from around the web of words considered socially unacceptable for one reason or another. What to do with a banned word list? Use this dirty word list to screen for spammers and griefers, to censor dissidents; to better understand the semiotic role of taboo signifiers in an online modality; to monitor user ...
Free
80 Million Tiny Images
Visual dictionary presents a visualization of all the nouns in the English language arranged by semantic meaning. Each of the tiles in the mosaic is an arithmetic average of images relating to one of 53,464 nouns. The images for each word were obtained using Google’s Image Search and other engines. A total of 7,527,697 images were used, each tile being the average of ...
Offsite
A Million Syllabi
A data set of over a million syllabi gathered by Dan Cohen’s Syllabus Finder tool from 2002 to 2009. It could be the largest collection of syllabi ever gathered by several orders of magnitude.
See a more detailed description on Dan Cohen’s blog
Format
Data are formatted as json records separated by newlines.
Caution: this data is messy and comes with no warranty.
Free
Password Dictionary
A list of 1,717,680 passwords. Useful for verifying whether or not users are displaying good password hygiene.
Offsite
Word List - 250,000+ Hyphenated, Capitalized and Compound English words
A common word list with over 250,000 entries of hyphenated, capitalized and compound English words. The download consists of entries containing more than one word, as well as capitalized words and acronyms. Phrases are considered “common” if they or variations of them occur in a standard dictionary or thesaurus. This word list is available in a simple, ...
Free
Word Lists Collection
The data is a smorgasbord of word lists, including spell check oriented word lists, an inflection database, parts of speech word list, jargon file word lists, the contents from Ispell, spell check dictionaries, tables that convert between American, British and Canadian spellings, and links to several other word lists.
Offsite
Word List - 1000 Most Frequent Words from an Internet Corpus
This file consists of the 1,000 most frequently used English words as used on the Internet computer network in 1992.
Free


