Pete Skomoroch's Bookmarks
Description:
Pete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.
He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.
Created over 2 years ago by Infochimps
Updated over 2 years ago
Article Search API - NYTimes.com
With the Article Search API, you can search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata. Along with standard keyword searching, the API also offers faceted searching. The available facets include Times-specific fields such as sections, taxonomic classifiers and ...
Offsite
Information Extraction: The RISE Repository of Information Sources
RISE is a distributed repository of online information sources that are used for the empirical analysis of learning algorithms that generate extraction patterns. The sources included in this repository are provided by people from the information extraction (IE) and wrapper generation (WG) communities. Both communities use machine learning algorithms to generate ...
Offsite
Offsite
Visualizing the Growth of Target, 1962-2008 | FlowingData
The first Target opened in 1962 in Roseville, Minnesota, and by 1972 there were 46. The corporation focused mostly on expansion in the Central United States for the next decade, but in 1982, Target acquired 33 FedMart stores in Arizona, California, and Texas. There are now over 1,600 stores across the United States. This visualization shows the location and opening date ...
Offsite
Offsite
Digging into Data - Various Repositories
A list of digital libraries, data archives, and data repositories that are inviting Digging into Data researchers to use their collections. For each repository, you’ll find a description of their contents, contact information, and other details.
Offsite
Offsite
Best Buy Remix - Welcome to the Best Buy Remix Developer Network
Opening up data gives it a purpose. Feel free to build upon our data. Be our guest.
BBYOpen offers a RESTful interface
It’s free and Commission can be earned
Well documented
Lots of samples & tutorials
Offsite
Twibs : Find the Businesses on Twitter
Twibs was created by a small group of people with one purpose: Give twitter users a place to find businesses on twitter. The Twibs founders are big believers in the power of twitter to connect customers with businesses. They are working on making it easy for consumers to find businesses, both local and national. Keep in mind, they’re just getting started, so there may ...
Offsite
Offsite
Offsite
Offsite
Offsite
Offsite
Free Book Usage Data from the University of Huddersfield
The University of Huddersfield released a major portion of their book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period. The data they’ve released essentially comes in two big chunks: 1) Circulation ...
Offsite
Offsite
Sparse Matrix Collection : Sparse Matrices From a Wide Range of Applications
These matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (such as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that ...
Offsite
Face Detection
hello i want face detection dataset.tanx
Offsite
Offsite
NORB Object Recognition Dataset, Fu Jie Huang, Yann LeCun, New York University
This database is intended for experiments in 3D object reocgnition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). ...
Offsite
Offsite
AFL-CIO Executive PayWatch Database
An index of company names that link to their CEO’s total compensation and to see how their compensation compares to your and other workers’ earnings.
Offsite
Offsite
Offsite
Offsite
Offsite
Google Flu Trends | How does this work?
Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time. This site has a visualization of Google Flu Trends in comparison to the CDC’s data. There is also a link to a dataset of Google Flu Trends weekly influenza activity estimates for the world, from December 2002 to the present. Each week, millions of ...
Offsite
Offsite


