h4cker/ai_research/ML_Fundamentals/ml_ai_datasets.md

3.9 KiB
Raw Blame History

Datasets for AI / ML Research

  1. UCI Machine Learning Repository: A collection of databases, domain theories, and data generators widely used by the machine learning community. Website: UCI ML Repository

  2. Kaggle Datasets: Offers a wide variety of datasets in different domains including economics, biology, computer vision, and natural language processing. Website: Kaggle

  3. AWS Public Datasets: Amazon Web Services offers a variety of public datasets that anyone can access. Website: AWS Public Datasets

  4. Google Dataset Search: A tool that enables the discovery of datasets stored across the web. Website: Google Dataset Search

  5. Microsoft Research Open Data: A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences. Website: Microsoft Research Open Data

  6. OpenML: An online platform for collaborative machine learning - easily share data, models, and experiments. Website: OpenML

  7. Data.gov: The home of the U.S. Governments open data, providing data, tools, and resources. Website: Data.gov

  8. EU Open Data Portal: Provides access to an expanding range of data from the European Union institutions and other EU bodies. Website: EU Open Data Portal

  9. Awesome Public Datasets on GitHub: A collection of high-quality open datasets in public domains. GitHub Repository: Awesome Public Datasets

  10. World Bank Open Data: Free and open access to global development data. Website: World Bank Open Data

  11. CERN Open Data Portal: Provides access to data generated by the Large Hadron Collider and other CERN experiments. Website: CERN Open Data Portal

  12. National Aeronautics and Space Administration (NASA): Offers a wide range of datasets related to space and Earth sciences. Website: NASA

  13. NOAA Data Sets: Provides access to national and global data on climate, weather, oceans, and coasts. Website: NOAA

  14. ImageNet: A dataset of over 15 million labeled high-resolution images across 22,000 categories. Website: ImageNet

  15. COCO (Common Objects in Context): A dataset with millions of images containing objects in complex scenes with annotations. Website: COCO Dataset

  16. Wikipedia: List of datasets for machine-learning research: A wikipedia article providing a comprehensive list of datasets for machine-learning research. Website: Wikipedia List

  17. Natural Earth Data: Offers free vector and raster map data at various scales. Website: Natural Earth Data

  18. Reddit Datasets: A subreddit that has datasets made available by the Reddit community. Website: Reddit Datasets

  19. Quandl: Provides financial, economic, and alternative datasets. Website: Quandl

  20. Stanford Large Network Dataset Collection: A collection of large network datasets including social networks, web graphs, etc. Website: Stanford Network Analysis Project

These sources offer a wide range of datasets from various domains, and you can explore them based on your specific requirements and interests in machine learning.