A Complete List of Publicly Available Datasets for Machine Learning

Cross-disciplinary data repositories, data collections and data search engines

Kaggle Datasets – The best place to discover and seamlessly analyze publicly-available data.

CrowdFlower – Our Data for Everyone library is a collection of our favorite open data jobs that have come through our platform. They’re available free of charge for the community, forever.Our Data for Everyone library is a collection of our favorite open data jobs that have come through our platform. They’re available free of charge for the community, forever.

Microsoft Research datasets – Microsoft Research provides a continuously refreshed collection of free datasets, tools and resources designed to advance the state of the art of academic research in many areas of computer science, such as natural language processing and computer vision. In addition, you can browse datasets and apply for cloud-based compute cycles available under the Azure for Research program.Microsoft Research provides a continuously refreshed collection of free datasets, tools and resources designed to advance the state of the art of academic research in many areas of computer science, such as natural language processing and computer vision. In addition, you can browse datasets and apply for cloud-based compute cycles available under the Azure for Research program.

University of California – Irvine datasets – We currently maintain 351 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. Our old web site is still available, for those who prefer the old format. For a general overview of the Repository, please visit our About page. For information about citing data sets in publications, please read our citation policy. If you wish to donate a data set, please consult our donation policy. For any other questions, feel free to contact the Repository librarians. We have also set up a mirror site for the Repository.

AWS Public Data Sets – Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. Learn more about Public Data Sets on AWS and visit the Public Data Sets forum.

Open Data Inception – A Comprehensive List of 2500+ Open Data Portals in the World

re3data.org  re3data.org is a global registry of research data repositories that covers research data repositories from different academic disciplines. It presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions.

DataCite – Locate, identify, and cite research data with the leading global provider of DOIs for research data.Locate, identify, and cite research data with the leading global provider of DOIs for research data.

figshare – figshare helps academic institutions store, share and manage all of their research ouputs.

LinkedData.org – Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.”

thewebminer.com – The web contains a multitude of data items which might be useful to a business, but most of it difficult to obtain in an integrated form. This process can be made remarkably easy through our services, regardless of the data coming from a single source or multiple sources.

datahub.io – the free, powerful data management platform from the Open Knowledge Foundation, based on the CKAN data management system.the free, powerful data management platform from the Open Knowledge Foundation, based on the CKAN data management system.

quandl – Quandl helps data analysts save time, effort and money by delivering high-quality financial and economic data in the precise format they need.

Datasets for Data Mining and Data Science

Single datasets and data repositories

Leave a Reply

Be the First to Comment!

Notify of
avatar

wpDiscuz