Public Datasets and Analysis
Ultimately, this will be a decent directory of datasets and sources available via the Internet. Here's some starters...
Bit Dotio; https://bit.io/bitdotio/
NYC OpenData; https://opendata.cityofnewyork.us/
Revolution Analytics' Finding Data on the Internet
Medicare Provider Utilization and Payment Data: Physician and Other Supplier
Not free and not "big", but decent looking person-based data at http://www.briandunning.com/sample-data/.
US gov: http://www.data.gov
UK gov: http://www.data.gov.uk
UCI repository: http://archive.ics.uci.edu/ml/
InfoChimps: http://www.infochimps.com/datasets
NASA: http://data.nasa.gov
Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets.html
NYC Taxi Trips; http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
FAA Flight Statistics; http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time
Obviously, the goal is to consume this stuff with Hadoop & Big Data!!
Some other possibilities
https://mixedanalytics.com/blog/list-actually-free-open-no-auth-needed-apis/
https://github.com/robertoduessmann/weather-api?tab=readme-ov-file
https://www.weather.gov/documentation/services-web-api#/default/glossary
https://www.reddit.com/r/apachekafka/comments/s5gwbr/good_source_of_freepublic_events_to_experiment/