The Comprehensive Knowledge Archive Network (CKAN) Collection
Description:
From their website:
CKAN is the Comprehensive Knowledge Archive Network, a registry of open knowledge packages and projects (and a few closed ones)…Those familiar with freshmeat, CPAN or PyPI can think of CKAN as providing an analogous service for open knowledge…CKAN is developed and maintained by the Open Knowledge Foundation. Both the CKAN code and data are open: free for anyone to use and reuse. To find out more check out the the CKAN project at knowledgeforge.net
CKAN is a peer in the global data commons and Infochimps is proud to be able to mirror their collection of over 300 datasets.
Created over 2 years ago by Infochimps
Updated over 2 years ago
US Census Bureau TIGER data
The US government’s ‘Topologically Integrated Geographic Encoding and Referencing’ system, usually referred to as TIGER, is based on an extensive database of US geographic information. It is county-level data that documents physical features like roads and rivers, as well as some administrative features such as Congressional districts. Data can be downloaded for ...
Offsite
National Public Transport Access Node database (NaPTAN)
From the [overview](http://naptan.org.uk/overview.htm): > NaPTAN provides a unique identifier for every point of access to public transport in the UK, together with meaningful text descriptions of the stop point and its location. This enables both computerised transport systems and the general public to find and reference the stop unambiguously. Stops can be related to ...
Offsite
National Land and Property Gazetteer
Description From main site: > The NLPG is the first, definitive, national address list that provides unique identification of properties across England and Wales and conforms to the British Standard, BS 7666. Local government, and potentially the public and private sectors, can link their information systems to this high-quality source of addresses and accurate ...
Offsite
National Street Gazetteer
Description From the [about page](http://www.thensg.org.uk/iansg/link.htm?id=100): > The National Street Gazetteer (NSG) is the definitive reference system used in the notification process and the coordination of street works. Under legislation, each local highway authority in England and Wales is required to create and maintain its own Local Street Gazetteer (LSG) ...
Offsite
Official Journal of the European Community (OJEC)
Discussed at [Workshop on Public Information, 2008-11-02](http://okfn.org/wiki/PublicInformation).
Offsite
Chemical Block
About
ChemBlock makes available two databases:
1. Building Blocks
fields: ID number, Structure, Chemical Name, Salt data
4925 compounds
2. Screening Library
fields: ID number, Structure, Salt data
122051 compounds
Openness
Terms of re-distribution/re-use are not mentioned on the site.
Offsite
Open Shakespeare
The Open Shakespeare package provides a full open set of Shakespeare’s works along with ancillary material, a variety of tools and a python API. Specifically, in addition to the works themselves (often in multiple versions), there is an introduction, a chronology, explanatory notes, a concordance and search facilities. All material is open source/open knowledge so that ...
Offsite
archive.org - Internet Archive
“The Internet Archive, a 501©(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public.”
Offsite
Binding DB - The Binding Database
About > BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of protein considered to be drug-targets with small, drug-like molecules. Openness Not open as restricts commercial re-use: > The database you are about to use is protected under copyright and/or patent law. While you are free to use the data ...
Offsite
Planning Alerts Planning Applications Database
UK Planning Application data from a variety of councils across the UK. More information plus as full up-to-date list of councils covered can be found at:
<http://www.planningalerts.com/about.php>
Offsite
Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network
About > Distributed Structure-Searchable Toxicity (DSSTox) Database Network is a project of EPA’s National Center for Computational Toxicology, helping to build a public data foundation for improved structure-activity and predictive toxicology capabilities. The DSSTox website provides a public forum for publishing downloadable, structure-searchable, standardized ...
Offsite
ICONCLASS - Multilingual Thematic Classification
About From the website: > This is an experimental service that makes the ICONCLASS Iconographic Classification system available as linked-data using the SKOS vocabulary. This service is inspired by the excellent Library of Congress Subject Headings linked data service. It is intentionally copied in spirit and conventions used. The idea is to enable others to make ...
Offsite
Discogs: Discographies
Discogs is a community-built database of music information. Imagine a site with discographies of all labels, all artists, all cross-referenced, this is what Discogs strives to be. Here you will find monthly data dumps of Discogs Release, Artist, and Label data. The data is in XML format and formatted according to the API spec. License All material is in the public ...
Offsite
RDFizing and Interlinking the EuroStat Data Set Effort
The statistical data published on riese was originally published by Eurostat.
Offsite
Securities & Exchange Commission's Public Information Server
This server features SEC public documents, information of interest to the investing public, rule-making activities, and access to the Commission’s electronic filing database, EDGAR. The public will be able to query the EDGAR database for any company currently filing electronically with the SEC. These filings are updated 24 hours after they are filed with the ...
Offsite
The 2000 US Census: 1 Billion RDF Triples
2000 U.S. Census converted into over a billion RDF triples.
Offsite
Wordnet
WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts ...
Offsite
EEA - Data service
About Overview: > The data service provides almost all data sets and applications which have been used in EEA’s periodical environmental reports. Topics include: Air emissions Air quality Corine land cover 1990 Corine land cover 2000 EEA owned data sets Land cover accounts Eurosion Nationally designated areas Point data Raster data Geospatial data ...
Offsite
World Values Survey
Description Large global surveys of ‘values’ taking place every five years since 1990 described on its website as “The world’s most comprehensive investigation of Political and Socio-Cultural Change”. Openness: Semi-Open Access: download in bulk is possible as well as analysis on the website. However have to go through terms and conditions (not ...
Offsite
DBTune
“This effort has started in the context of the Linking open data community project of the Semantic Web Education and Outreach Interest Group. Its main purpose is to make available freely available data concerning music on the semantic-web, such as Magnatune, Jamendo, Dogmazic, Mutopia, and to create links between them and other available semantic web repositories, such ...
Offsite
Correlates of War
Description The Correlates of War project hosts a variety of datasets related to the study of inter-state conflict. Details As of 2007-09-22 the following datasets were listed: State System Membership (v2004.1): This data set records the fluctuating composition of the state system since 1816. It also identifies countries corresponding to the standard ...
Offsite
Bulk.resource.org
Bulk.resource.org is a service of public.resource.org. Public.resource.org is a non-profit committed to publishing and sharing public domain materials in the United States. This system contains unsupported, as-is copies of selected U.S. government archives, including: The SEC’s EDGAR Database Commerce Business Daily U.S. Copyright Database Patent Full Text Database ...
Offsite
Statistics Canada
About From [what we do](http://www.statcan.gc.ca/about-apercu/overview-apercu-eng.htm) page: > Statistics Canada, a member of the Industry Portfolio, produces statistics that help Canadians better understand their country—its population, resources, economy, society and culture. Access/re-use [Copyright ...
Offsite
EUROPA - Register of Commission documents
About Overview > The register contains references both of documents which have already been published and of internal (unpublished) Commission documents, from the 1st January 2001. Information in register includes: the identifier or reference number, the title of the document in the languages in which it is available, the date of the document, the languages in ...
Offsite
Places of interest in the London Borough of Sutton
A CSV file of places of interest in the London Borough of Sutton as compiled by Sutton Active.
Currently with 142 items. The CSV file is dynamically generated from the live database with each request. Please cache locally if you require regular access.
No geotags yet but I’m working on it.
Offsite
real-time information about the global routing system from the perspectives of several different bac
The University’s Route Views project was originally conceived as a tool for Internet operators to obtain real-time information about the global routing system from the perspectives of several different backbones and locations around the Internet. ...
Free
GeoCommons
Description Geocommons is a website for uploading and visualizing datasets with a geospatial component (so they can be plotted on a map). Focus is on visualization rather than the data with tagline: “Explore, Create and Share Intelligent Maps and Geographic Data” Openness: PASS- License: all datasets licensed under cc by-sa 3.0 Access: data provided in kml ...
Offsite
Open History
Collection of articles – mostly about Japanese history.
Started in 2001 and last updated in 2006-09-18.
Offsite
History Commons
About From [about](http://www.historycommons.org/aboutsite.jsp) page: > What is the History Commons website? > The History Commons website is run by the Center for Grassroots Oversight (“CGO”), an organization that is fiscally sponsored by The Global Center, a 501©3 non-profit organization. CGO was incorporated as a public benefit corporation in late 2006, and ...
Offsite
Open Text Book
“Open Text Book is a registry of textbooks and text book material that is open in accordance with the Open Knowledge Definition (OKD).”
Offsite
Wikispecies
“Wikispecies is an open, free directory of species. It covers Animalia, Plantae, Fungi, Bacteria, Archaea, Protista and all other forms of life.”
Offsite
Wikibooks
“Welcome to Wikibooks, a Wikimedia project that was started on July 10, 2003 with the mission to create a free collection of open-content textbooks that anyone can edit.”
Offsite
Wikisource
“Wikisource is an online library of free content publications collected and maintained by the community (see our inclusion policy).”
Offsite
Wikimedia Commons
Over 2 million freely usable media files to which anyone can contribute
Offsite
Given Name Frequency Project
Quite a bit of data is available for download but only individually (not in a single file). According to web page have have: > * GINAP – code to standardize given names and correct common problems in name samples. Such standardization is an important step in analysis of given names. > * Popular given names, US 1801 to 1999 – a collection of sets of standardized ...
Offsite
Ekopedia
Ekopedia is “the practical encyclopedia about alternative life techniques”. It is dedicated to providing information related to environmental sustainability.
License
Creative Commons
Offsite
ISO 639-2 - Codes for the Representation of Names of Languages
About From home page: > ISO 639 provides two sets of language codes, one as a two-letter code set (639-1) and another as a three-letter code set (this part of ISO 639) for the representation of names of languages. ISO 639-1 was devised primarily for use in terminology, lexicography and linguistics. This part of ISO 639 represents all languages contained in ISO 639-1 ...
Offsite
Wikinews
“We are a group of volunteers whose mission is to present reliable, unbiased, relevant and entertaining News.
All content is released under a free license. By making our content perpetually available for free redistribution and use, we hope to contribute to a global digital commons."
Offsite
Wiktionary
“Welcome to the English-language Wiktionary, a collaborative project to produce a free, multilingual dictionary with definitions, etymologies, pronunciations, sample quotations, synonyms, antonyms and translations. Wiktionary is the lexical companion to the open-content encyclopedia Wikipedia.”
Offsite
Wikiquote
“Welcome to Wikiquote, a free online compendium of quotations from notable people and creative works in every language, including sources (where known), translations of non-English quotes, and links to Wikipedia for further information! The English version of Wikiquote has 13,799 pages so far with many thousands of quotations and proverbs.”
Offsite
FreeBMD (Births, Marriages and Deaths)
Description From front page: “FreeBMD is an ongoing project, the aim of which is to transcribe the Civil Registration index of births, marriages and deaths for England and Wales, and to provide free Internet access to the transcribed records.” Openness: NOT OPEN 1. License: access for personal research purposes only. Full T&C below. 2. Access: single ...
Offsite
Open Media Database
About “omdb (open media database) is a free database for film media. There is no set editorial staff, but rather a large number of movie addicts and lovers who volunteer their time to provide material and develop the site. Anybody can add or change existing information on omdb once they have done the quick and simple task of signing up for their user login name. ...
Offsite
Fine Rolls of Henry III
Description From <http://www.finerollshenry3.org.uk/cocoon/frh3/content/about/about.html>: > The Henry III Fine Rolls Project is a three year enterprise commencing in April 2005, funded by the Arts and Humanities Research Council. It aims to publish the Fine Rolls of Henry III from 1216 down to 1248. It is hoped that a second three year project will complete ...
Offsite
Open-Of-Course
Open-Of-Course is a multilingual and interactive portal for open content courses and tutorials. It is based on the free software ELO “Moodle” and people are welcome to add their own open educational content to the system.
Offsite
FreeDict
About Summary from [SourceForge page](http://sourceforge.net/projects/freedict/): > Free translating dictionaries. The data is kept as XML complying to the TEI DTD. This enables to include features such as phonetics, part of speech and etymology information in a project independent format. Access/Re-use Fully open. From the [project ...
Offsite
ChemIDplus
About > This database allows users to search the NLM ChemIDplus database of over 370,000 chemicals. A user may enter compound identifiers such as Chemical Name, CAS Registry Number, Molecular Formula, Classification Code, Locator Code, and Structure or Substructure. New searchable features include search and display by Toxicity indicators such as Median Lethal Dose ...
Offsite
Ancient Geographic Information
Description Datasets produced by the [pleiades project](http://pleiades.stoa.org/about-pleiades): > Organized by the Ancient World Mapping Center at the University of North Carolina at Chapel Hill, U.S.A., Pleiades brings together a global community of scholars, students and enthusiasts to expand and enhance continually the information originally brought together by ...
Offsite
HapMap
Description The International HapMap Project is a partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom and the United States to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals. Datasets From ...
Offsite
Languages of the World (Multilingual RDF Descriptions)
Description Linkvoj means languages in Esperanto. From the frontpage of <http://www.lingvoj.org/>: http://www.lingvoj.org/lingvoj.rdf is the complete RDF file gathering currently the description of 507 languages, including all languages defined by ISO 639-1 and most of ISO 639-2 codes (a few exceptions remain, for which Wikipedia articles are not consistent with ...
Offsite
Open Font Library
Openness: OPEN
License: SIL OFL (http://openfontlicense.org/)
Access: yes from each page (by hand)
bulk: no
Offsite


