Ten sources of free big data on Internet


I am often asked by clients where can they find public datasets for analysis and for inclusion in their own data analytics. This brief list of ten key sources shows the range of data available via some key national and international organisations. Most of this data is freely available and updated regularly as new data becomes available.
Some of these datasets are large, 300TB from one source alone.

10 Sources of Data

Data provided by UK open data program. Includes government statistics on Economics, Finance, Health and Agriculture.
Source of data for the US Government open data; Divided into sections covering agriculture, finance, business. Contains over 180,000 datasets for public access.

USA census data collected on population, detailed down to block level. Also provides tools for analysis or alternatively, bulk data download.
Eurostat, the European Statistics Organisation provides the European Union open data repository. Covers a huge range of topics including census data, business data, migration data, economy, agriculture and health.
Amazon Datasets
Amazon provides a range of open source data hosted on their S3 storage. The data is free, though processing charges are applied for computer processing on their EWS platforms.
Data available include Landsat satellite imagery, updated daily, also climate data, the million song collection of 28 music datasets, social media data, genomes data from the Human Genome Project.
CERN – the European Organisation for Particle Physics provide open data on a number of their experiments, for example the Large Hadron Collider has provided some 300 TB of data, some processed to make it suitable for schools and colleges.
The World Bank provide a huge range of development and economic data via their open data program. These often include easy to use software interfaces in addition to direct data download. One example is the World Development Indicators Database. You can extract data from this database easily using their software. I wrote a program to illustrate access to this dataset here: https://tendron.shinyapps.io/WorldBank1/
An extensive range of economics datasets are listed on this website, including stock market data, government bond data and GDP data among a wide range of other economic and financial timeseries.

A list of datasets available via Microsoft Azure, many of them are free, though not all.
The bank of England provide a large range of banking, monetary and financial statistics in the Statistical Interactive Database. Other data sets include forecasts for the UK economy and statistics on public finance and spending.
European Economy data can be downloaded from the Europa portal site. The datasets are contained in the Economic and Financial Affairs Directorate site (ECFIN DG). The home page of the directorate is: http://ec.europa.eu/economy_finance/index_en.htm.
On this site you will find a whole range of statistics for each of the 30 OECD countries, the euro area and the OECD as a whole. The statistics are arranged by topic group, including national Accounts, Finance, Agriculture, Development, International Trade, Labour, Prices, Public Management and Short-term Economic Statistics.

Summary

The above free datasets provide enormous quantities of data suitable for professional analysis. CERN, for example provide 300TB for their Large Hadron Collider experiment alone. Nevertheless, they also provide processed datasets which are suitable for school and college projects.

Commentaires

Posts les plus consultés de ce blog

Spark performance optimization: shuffle tuning

Spark optimization

Use Apache Spark to write data to ElasticSearch