Ganglion's profile

Name:
Jacob Perkins

Uploaded datasets

11,000+ Youtube Videos

This dataset is useful for studying the dynamics of threaded comments in rich media sharing, as well as interesting participants in the conversations. This dataset involves a set of about 11,000 videos. Included is information about: tags number of views number of comments ratings textual content of the comments the authors and timestamps of the comments. Citation: ...
Free

20 Newsgroups Dataset (De-Duped Version)

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It is speculated that it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 Newsgroups collection has become a ...
Free

Airports and Their Locations

A list of over 9,000 global and domestic airport locations. Data includes airport code, geographic coordinates, other geo-related data, and data unique to airports like runway length and elevation. Airports and their surrounding areas are hubs of business activity, in many cases its a tourist’s first glimpse of a city, or in other instance the epicenter of shipments. ...
Free

Amsterdam Museum Data Set (RDF)

The Amsterdam Museum dataset describes more than 70,000 cultural heritage objects related to the city of Amsterdam described by the museum. The metadata was retrieved from an XML Web API of the museum’s Adlib collection database and converted to RDF compliant with the Europeana Data Model (EDM). This makes the Amsterdam Museum data the first of its kind to be ...
Offsite

AOL Search Data

The AOL Search Data is a collection of real query log data that is based on real users. The data set consists of 20M web queries collected from 650k users over three months. These private searches are perfect for research and mining. The data is sorted by anonymous user ID and sequentially arranged. The collection can be used for personalization, query reformulation or ...
Free

C. Elegans Neural Network, Flat Adjaceny List

A directed, weighted network representing the neural network of C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314, 1-340 (1986).
Free

Color List

A comprehensive list of colors that are included in wikipedia articles about color. This includes the color name, hex triplets, and rgb values. Source: http://en.wikipedia.org/wiki/List_of_colors
Free

Condensed Matter Collaboration Network

Description: Data describes a collaboration network of scientists posting preprints on the condensed matter archive at www.arxiv.org. This version is based on preprints posted to the archive between January 1, 1995 and March 31, 2005. The network is weighted, with weights assigned as described in M. E. J. Newman, Phys. Rev. E 64, 016132 (2001). These data can be cited ...
Free

Corpus of Erotica Stories

Excellent resource for working with natural language processing and machine learning. This corpus consists of 4771 raw text erotica stories collected from www.textfiles.com/sex/EROTICA. A logical flow from the encouragement of writing on BBSes, people have been writing some form of erotica or sexual narrative for others for quite some time. With the advent of Fidonet ...
Free

Digg.com Data Set

Digg is a social news website. The dataset spans, from August to November, 2008, when Digg’s cornerstone function still consisted of letting people vote stories up or down, called digging and burying, respectively. In the dataset, the total number of user-user links in the social graph is about 56,000 spanning over about 10,000 users. This dataset is useful for ...
Free

Disasters worldwide from 1900-2008

Disaster data from 1900 – 2008, organized by start and end date, country (and sub-location), disaster type (and sub-type), disaster name, cost, and persons killed and affected by the disaster. Create disaster data trend reporting, based on geography, frequency, date or nature of the event. Design a visualization or time lapse illustrating disaster events around the ...
Free

Flickr Images

This Flickr data set contains over 2,000 downloaded images from 52 different groups. The information can be utilized for image content analysis in issues related to rich social media. Each image is indexed by its Flickr photo id and the corresponding group to which it belongs. Citation: Choudhury, M. D., Sundaram, H., Lin, Y-R., John, A., and Seligmann, D. D. (2009). ...
Free

Flora of North America

FNA presents for the first time, in one published reference source, information on the names, taxonomic relationships, continent-wide distributions, and morphological characteristics of all plants native and naturalized found in North America north of Mexico. Source: http://www.fna.org/
Offsite

Google Books Ngrams

Description Here are the datasets backing the Google Books Ngram Viewer. These datasets were generated in July 2009; we will update these datasets as our book scanning continues, and the updated versions will have distinct and persistent version identifiers (20090715 for the current set). Each of the links will directly download a fragment of the given corpus. For ...
Offsite

Hex color codes to RGB values and color names

A simple mapping from hex color codes to color names and rgb values. Eg: color, hex, r, g, b Almond,#EFDECD,239,222,205 Dodger blue,#1E90FF,30,144,255 Meat brown,#E5B73B,229,183,59 Scarlet,#FF2000,255,32,0 Tiffany Blue,#0ABAB5,10,186,181 Violet (color wheel),#7F00FF,127,0,255 Source: http://en.wikipedia.org/wiki/List_of_colors
Free

Jazz Musicians Network

Description: List of edges of a network of Jazz musicians as a flat (.tsv) file. Data compiled by members the Alex Arenas group (from Dept. of Computer Science and Mathematics, Universidad Rovira i Virgili). Please cite P.Gleiser and L. Danon , Adv. Complex Syst.6, 565 (2003). Fields: Short name Type Description Source Int Integer vertex label Target Int ...
Free

Marvel Universe Chronology Project

The MCP is an effort to catalog every actual appearance by every significant character in the Marvel Universe, and place them in their proper chronological order. If there’s a particular character that’s struck your fancy, that you just can’t live without owning their every appearance, this is the place to start. Simply click on the first character of their name ...
Offsite

Marvel Universe Social Graph

A fun Marvel Comics character collaboration graph constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. The Marvel Universe, that is, the artificial world that takes place in the universe of the Marvel comic books, is an example of a social collaboration network. They compare the characteristics of this universe to ...
Free

Mushroom Data Set

This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no ...
Offsite