Data sets tagged with "language"
Word List - 100,000 + Official Crossword Words (Excel readable)
A word list with over 100,000 entries that are officially permitted in crossword games like Scrabble™. This word list is available in a simple, alphabetically-ordered Excel format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom spelling dictionary. The entries include variants of ...
Free
Word List - 74,000+ Common English Dictionary Words (with Definitions, Excel format)
74,550 common dictionary words — A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.
$4.00
Word List - 10,000+ Common Place Names
U.S. place names for more than 10,000 entries. This U.S. place name list is available in a simple, alphabetically-ordered .txt format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom location tool or database. The entries represent a sampling of U.S. place names: 10,196 places in total.
Free
Word List - 100,000+ official crossword words (with Definitions, Excel format)
A list of 113,809 words officially permitted in crossword games like Scrabble™ with their definitions. The words are compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has variants of words: -ing, -ed, -s, and so on, it makes a good addition when building a custom spelling dictionary. It is an reference to have handy for ...
$4.00
Children Who Speak a Language Other Than English at Home: 2000 to 2004
The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free
Word List - 100,000+ official crossword words (Excel readable)
113,809 official crosswords A list of words permitted in crossword games such as Scrabble™. Compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has all forms: -ing, -ed, -s, and so on of words, it makes a good addition when building a custom spelling dictionary.
Free
Language Spoken at Home - Cities of 100,000 or More: 2005
The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free
Languages Spoken at Home by Language: 2005
The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free
Word List - 350,000+ Simple English Words (Excel readable)
Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.
Free
Foreign Language Enrollments in Public High Schools by Type of Language: 1970 to 2000
The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free
Language Spoken at Home by State: 2005
The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free
TalkBank
About About TalkBank: > The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the subfields studying communication. It will use these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via ...
Offsite
The Kids Open Dictionary Builder
About From the creators: > The purpose of this project is to create a free, open simple dictionary for students to use. The words in the dictionary will reviewed for quality and appropriateness and ultimately “frozen” for export into a variety of formats, including text, PDF, ebooks, wikis, web, etc., for use on a variety of platforms. > The site also includes a ...
Offsite
FSI Language Courses
About From website: > Welcome to fsi-language-courses.com, the home for language courses developed by the Foreign Service Institute. These courses were developed by the United States government and are in the public domain. > This site is dedicated to making these language courses freely available in an electronic format. This site is not affiliated in any way with ...
Offsite
Dict.cc - English German Dictionary
About From [about page](http://www.dict.cc/?s=about%3A): > dict.cc is not only an online dictionary. It’s an attempt to create a platform where users from all over the world can share their knowledge in the field of translations. Every visitor can suggest new translations and correct or confirm other users’ suggestions. The challenging and most important part of ...
Offsite
The Speech Accent Archive
From website: > The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers. On [about ...
Offsite
ISO language, territory, currency codes and their translations
Description This is a set of ISO codes including those for country and currency collected together into a useful package by the Debian project. From the package page: > This package provides the ISO-639 Language code list, the ISO-4217 currency list, the ISO-3166 Territory code list, and ISO-3166-2 sub-territory lists. > > It also (more importantly) provides their ...
Offsite
Statistical Machine Translation - Europarl Parallel Corpus
About Overview: > The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish. > The goal of the extraction and processing was to generate sentence aligned text for ...
Offsite
The DGT Multilingual Translation Memory of the Acquis Communautaire
As of November 2007, the European Commission’s Directorate-General for Translation (DGT) made publicly accessible its multilingual Translation Memory for the Acquis Communautaire (the body of EU law) – a collection of parallel texts (texts and their translation, also referred to as bi-texts) in 22 languages. This is a page for technical users, where you will find a ...
Offsite
MOCHA-TIMIT
About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999. Purpose: Phonetically balanced dataset for training an automatic speech recognition system Openness Availability: English speakers available here free for non-commercial use and may be distributed on CDROM for a ...
Offsite


