Pete Skomoroch's Bookmarks

Description:

Pete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.

He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.

Created over 2 years ago by Infochimps

Updated over 2 years ago

Article Search API - NYTimes.com

With the Article Search API, you can search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata. Along with standard keyword searching, the API also offers faceted searching. The available facets include Times-specific fields such as sections, taxonomic classifiers and ...
Offsite

Information Extraction: The RISE Repository of Information Sources

RISE is a distributed repository of online information sources that are used for the empirical analysis of learning algorithms that generate extraction patterns. The sources included in this repository are provided by people from the information extraction (IE) and wrapper generation (WG) communities. Both communities use machine learning algorithms to generate ...
Offsite

Visualizing the Growth of Target, 1962-2008 | FlowingData

The first Target opened in 1962 in Roseville, Minnesota, and by 1972 there were 46. The corporation focused mostly on expansion in the Central United States for the next decade, but in 1982, Target acquired 33 FedMart stores in Arizona, California, and Texas. There are now over 1,600 stores across the United States. This visualization shows the location and opening date ...
Offsite

Digging into Data - Various Repositories

A list of digital libraries, data archives, and data repositories that are inviting Digging into Data researchers to use their collections. For each repository, you’ll find a description of their contents, contact information, and other details.
Offsite

Twibs : Find the Businesses on Twitter

Twibs was created by a small group of people with one purpose: Give twitter users a place to find businesses on twitter. The Twibs founders are big believers in the power of twitter to connect customers with businesses. They are working on making it easy for consumers to find businesses, both local and national. Keep in mind, they’re just getting started, so there may ...
Offsite

Free Book Usage Data from the University of Huddersfield

The University of Huddersfield released a major portion of their book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period. The data they’ve released essentially comes in two big chunks: 1) Circulation ...
Offsite

Sparse Matrix Collection : Sparse Matrices From a Wide Range of Applications

These matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (such as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that ...
Offsite

NORB Object Recognition Dataset, Fu Jie Huang, Yann LeCun, New York University

This database is intended for experiments in 3D object reocgnition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). ...
Offsite

AFL-CIO Executive PayWatch Database

An index of company names that link to their CEO’s total compensation and to see how their compensation compares to your and other workers’ earnings.
Offsite

Google Flu Trends | How does this work?

Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time. This site has a visualization of Google Flu Trends in comparison to the CDC’s data. There is also a link to a dataset of Google Flu Trends weekly influenza activity estimates for the world, from December 2002 to the present. Each week, millions of ...
Offsite

All Collections