Public Datasets
Based on Original List from Prof. Polo Chau – Fall 2017
Dataset Collections
- Google public datasets
- Yahoo Webscope
- Data.gov: U.S. Government’s open data.
- UCI Machine Learning Repository
- Internet Archive
- Kaggle Datasets
- Stanford Large Network Dataset Collection
- DBpedia
- AWS Open Data
- Microsoft Research Public Data
Datasets for Specific Domains
- IPEDS Data: Postsecondary education data from National Centre for Education Statistics
- Bureau of Labor Statistics data
- OpenAlex: data on scientific publication records
- NYC Taxi data
- Zillow: real estate listings
- Dataset about soccer games, players, clubs: No API, but easy to scrape. For a soccer player: transfer history, performance, nationality, birth date, etc. For a soccer club: performance, squad, etc.
Note: If you know of other useable public datasets that can be used, please inform us so we can add to this ongoing list.