Category: Technology
Not finding the data sets you're looking for? Not all of our data sets are categorized yet. Try checking out tags instead.
Showing 1 - 20 out of 300 datasets
A geolocation API with 20 fields of search results, all customized to your IP query. Search by IP address to return data about a geographical area, including country, region, city, internet connection speed, global coordinates, postal and country codes, time zone, and even daylight savings observation status. Looking for more dimensions of IP searchable data? Try the ...
API
A reverse IP lookup API with 5 fields of search results, all customized to your IP query. Search by IP address to return data about the domain, company, ISP, NAICS industry code and proxy type for an IP address. Looking for more dimensions of IP searchable data? Try the Geolocation API, returning up to 20 geo data points of custom query information per IP address. Or ...
API
A geolocation API for all your demographics needs. Search by IP address to return data about a geographical area, including number of households, gender, age groups and language. Looking for more dimensions of IP searchable data? Try the Gelocation API, returning up to 20 geo data points of custom query information per IP address. Or the Domains API that retrieves ...
API
Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data comes from analysis on the full set of tweets during that time period, which is 35 million users, over 500 million tweets, and more than 1 billion relationships between users. This dataset maps Twitter screen names to a user’s corresponding Twitter API ...
$20.00
From the CALO Project at Carnegie-Mellon University a massive dataset of emails recovered from discovery documents in the Enron trials About From distribution page: > This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into ...
Offsite
Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data set consists of “tokens,” which are hashtags (#data), URLs, or emoticons (Twitter smileys or other “faces” created using keyboard characters). The data comes from analysis on the full set of tweets during that time period, which is 35 million ...
$300.00
Twitter data from millions of tweets! This is a download of Twitter data from March 2006 to November 2009. The data set consists of “tokens,” which are hashtags (#data), URLs, or emoticons (Twitter smileys or other “faces” created using keyboard characters). The data comes from analysis on the full set of tweets during that time period, which is 40 million ...
$1,000.00
Twitter smiley data from millions of tweets! This is a free download of Twitter data from March 2006 to November 2009. The smiley data comes from analysis on the full set of tweets during that time period, which is 35 million users, over 500 ...
Free
A word list with over 100,000 entries that are officially permitted in crossword games like Scrabble™. This word list is available in a simple, alphabetically-ordered Excel format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom spelling dictionary. The entries include variants of ...
Free
DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research.
Michael G. Noll
Offsite
This collection consists of ~20M web queries collected from ~650k users over three months. The data is sorted by anonymous user ID and sequentially arranged. The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research. From AOL’s original Read-Me ...
Offsite
List of summonable objects from the Nintendo DS game Scribblenauts, from AARDVARK, ABOMINABLE SNOWMAN and ABSCONDER to ZOMBIE, ZUNICERATOPS and ZYGOTE. via the Scribblenauts Wikipedia entry: Scribblenauts is an emergent puzzle action video game with the tagline “Write Anything, Solve Everything”. Its objective is to complete puzzles by summonning any object (from a ...
Free
This data is derived from the MySpace real-time stream API. The word count is from the free-form text fields MySpace moods, forum topic titles, replies to forum topics, text from sharing a link or item, and status mood updates. For the last three months the words from these fields have been extracted and this dataset contains their totals binned by day.
$25.00
This data is derived from the MySpace real-time stream API. The word count is from the free-form text fields MySpace moods, forum topic titles, replies to forum topics, text from sharing a link or item, and status mood updates. For the last three months the words from these fields have been extracted and this dataset contains their totals binned by hour.
$50.00
This data is derived from the MySpace real-time stream API. It contains all users in our dataset, around 11 million, with well-formed zip codes.
$120.00
The data is derived from the MySpace real-time stream API. It contains all of the users in the dataset, around 11 million, with well-formed latitude and longitude.
$150.00
Twitter influence metrics with the click of a button! Trstrank measures Twitter user reputation, importance and influence in a way far more robust than counting the number of followers. It is a sophisticated measure of a user’s relative importance among the entire Twitter network. The API measures Twitter influence across two dimensions for each query: the ...
API
A fun Marvel Comics character collaboration graph constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. The Marvel Universe, that is, the artificial world that takes place in the universe of the Marvel comic books, is an example of a social collaboration network. They compare the characteristics of this universe to ...
Free
Twitter influence – how to measure it? Let us count the ways: enthusiasm, feedness, sway, follow churn, trstrank, followers, outflux, interesting, chattiness, follow rate, influx… How Does The Twitter Influence API Work? The Twitter Influence API performs a “scrape” of the social network to gather and evaluate individual instances of metrics like mentions, ...
API
The AOL Search Data is a collection of real query log data that is based on real users. The data set consists of 20M web queries collected from 650k users over three months. These private searches are perfect for research and mining. The data is sorted by anonymous user ID and sequentially arranged. The collection can be used for personalization, query reformulation or ...
Free