Data sets tagged with "google"

Document Metadata Based on a Sample of Web Documents from the Open Directory

DMOZ100k06 is a large research data set about document metadata based on a random sample of 100,000 web documents from the Open Directory combined with data retrieved from the social bookmarking service delicious.com, the content rating system ICRA, and the search engine Google. The data set is freely available for other research. Michael G. Noll
Offsite

PyGTrends: Python API for Google Trends Data

This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords ...
Offsite

Google Flu Trends | How does this work?

Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time. This site has a visualization of Google Flu Trends in comparison to the CDC’s data. There is also a link to a dataset of Google Flu Trends weekly influenza activity estimates for the world, from December 2002 to the present. Each week, millions of ...
Offsite

Linguistic Data Consortium (LDC) - Collection of Linguistic Corpora and Datasets

The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC’s host institution. The LDC was founded in 1992 with a grant from the Advanced ...
Offsite

Google Video

This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Google Video.
Free

GoogleTransitDataFeed

List of publicly-accessible transit data feeds This is a list of transit schedule data published by transit agencies and operators in GTFS format for developers to use. They contain scheduled times, stop locations, route information and optionally fare information and detailed route shapes.
Offsite

Google Books Ngrams

Description Here are the datasets backing the Google Books Ngram Viewer. These datasets were generated in July 2009; we will update these datasets as our book scanning continues, and the updated versions will have distinct and persistent version identifiers (20090715 for the current set). Each of the links will directly download a fragment of the given corpus. For ...
Offsite

Arabic Sports Keywords

A list of the most popular sports-related keywords in the Arabic languages, mainly from Arab countries. This dataset show the keywords and how many times they were requested on Google and appeared for a sports advertiser. Problems with numbers: 1. Not all campaigns have the same budgets due to targeting and pricing issues, so this will affect the accuracy 2. The ...
Free

Google Labs - Books Ngram Viewer

Here are the datasets backing the Google Books Ngram Viewer. These datasets were generated in July 2009; we will update these datasets as our book scanning continues, and the updated versions will have distinct and persistent version identifiers (20090715 for the current set). Each of the links below will directly download a fragment of the given corpus. For instance, ...
Offsite

All Tags