License: Creative Commons NC-SA

Creative Commons NC-SA

Data sets with this license (2)

Westbury Lab Usenet Corpus: 28M postings from 47000+ newsgroups 2005-2009

A USENET corpus (2005-2009) This corpus is a collection of public USENET postings. This corpus was collected between Oct 2005 and Jan 2010, and covers 47860 English language, non-binary-file news groups. Despite our best effots, this corpus includes a very small number of non-English words, non-words, and spelling errors. The corpus is untagged, raw text. It may be ...
Offsite

Speech Accent Archive: 1200+ speech samples from a variety of language backgrounds

The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers. The Elicitation Paragraph Please call Stella. Ask her ...
Offsite