Data sets tagged with "english"

Children Who Speak a Language Other Than English at Home: 2000 to 2004

The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free

The Kids Open Dictionary Builder

About From the creators: > The purpose of this project is to create a free, open simple dictionary for students to use. The words in the dictionary will reviewed for quality and appropriateness and ultimately “frozen” for export into a variety of formats, including text, PDF, ebooks, wikis, web, etc., for use on a variety of platforms. > The site also includes a ...
Offsite

80 Million Tiny Images

Visual dictionary presents a visualization of all the nouns in the English language arranged by semantic meaning. Each of the tiles in the mosaic is an arithmetic average of images relating to one of 53,464 nouns. The images for each word were obtained using Google’s Image Search and other engines. A total of 7,527,697 images were used, each tile being the average of ...
Offsite

Dict.cc - English German Dictionary

About From [about page](http://www.dict.cc/?s=about%3A): > dict.cc is not only an online dictionary. It’s an attempt to create a platform where users from all over the world can share their knowledge in the field of translations. Every visitor can suggest new translations and correct or confirm other users’ suggestions. The challenging and most important part of ...
Offsite

MIDAS - Heritage project

From the website: > What is MIDAS? > MIDAS sets out an agreed list of the items or ‘units’ of information that should be included in an inventory or other systematic record of the historic environment. These units of information are grouped together under broad headings or ‘information schemes’. These cover areas such as Monument Character, Events, People and ...
Offsite

The JMdict (Japanese-Multilingual Dictionary) project

About Overview: > The JMdict (Japanese-Multilingual Dictionary) project has at its aim the compilation of a multilingual lexical database with Japanese as the pivot language. The project began in 1999 as an offshoot of the EDICT Japanese-English Electronic Dictionary project. It involved a major rebuild of the main files, with a more complex structure using XML. > ...
Offsite

Renascence Editions

These [public domain works] are provided for nonprofit purposes only; unique site content is copyright ©1992-2007 the editors and The University of Oregon. Corrections and comments to the publisher, Risa Stephanie Bear, M.S., M.A., rbear[at]uoregon.edu…. Early Modern texts published by Renascence Editions are not peer reviewed. While we have done our best to ensure ...
Offsite

Eurfa

About Eurfa consists of an English-Welsh and a Welsh-English dictionary. There are currently around 13,000 words. Authored by Kevin Donnelly, it is currently being used in the development of [Apertium-cy] (http://www.cymraeg.org.uk), a Welsh-English translator. For further information, see [the project website](http://eurfa.org.uk/) and [this blog ...
Offsite

Ding: German-English Dictionary

About German English dictionary from Frank Richter at [Chemnitz University of Technology](http://www.tu-chemnitz.de/). It has been maintained since 1995 (see the [readme file](http://ftp.tu-chemnitz.de/pub/Local/urz/ding/de-en/Readme)). There are now over 216,000 entries. Format Format is .txt. Access/re-use Licensed under GPL.
Offsite

Oxford English Dictionary (OED)

Scans of the first edition of the Oxford English Dictionary along with some software to search those scans. [The post](http://lists.canonical.org/pipermail/kragen-tol/2006-March/000816.html) details work up to volume 6 (as of March 2006) and it is not clear whether any more digitization has been done since then but a search of the Internet Archive (where the scans are ...
Offsite

Linguistic Data Consortium (LDC) - Collection of Linguistic Corpora and Datasets

The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC’s host institution. The LDC was founded in 1992 with a grant from the Advanced ...
Offsite

Word Lists Collection

The data is a smorgasbord of word lists, including spell check oriented word lists, an inflection database, parts of speech word list, jargon file word lists, the contents from Ispell, spell check dictionaries, tables that convert between American, British and Canadian spellings, and links to several other word lists.
Offsite

Speech Accent Archive: 1200+ speech samples from a variety of language backgrounds

The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers. The Elicitation Paragraph Please call Stella. Ask her ...
Offsite

Word Frequencies in Written & Spoken English from British National Corpus (100M-word)

by Geoffrey Leech, Paul Rayson, Andrew Wilson Overview Download word lists Books of English word frequencies have in the past suffered from severe limitations of sample size and breadth. They have also tended to be restricted to word forms alone. Most importantly, almost all have dealt only with written language. This book overcomes these limitations. It is derived from ...
Offsite

English Case Infobox

This dataset consists of a collection of Infoboxes from Wikipedia on the topic of English Case Infobox.
Free

All Tags