Hadoop & Big Data

My landing page for "all things Hadoop", Big Data, and related technologies.  The content is rather unstructured right now, but I'll get there.  Take a look at David Streever's Hadoop space.

 

I try to post a fair amount to my Professional Blog on this site around Big Data technologies; just look for content with the "hadoop", "spark" and/or "big_data" label as shown below.  Feel free to offer up thoughts on what my Upcoming Blog Posts should be about.


NOTE: The remainder of this document has no real meaningful structure and is as much a parking lot of ideas and links that I will SOMEDAY come back to apply some structure to.  Thanks, Lester Martin.

Best Practices for 3rd Party JARs

Figure out what the best practice is.  Some notes at http://stackoverflow.com/questions/16825821/parsing-json-input-in-hadoop-java to get this topic going.

Generic Convert Uncompressed Text File to Snappy Encoded Sequence File

Based on thoughts from http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/ and http://stackoverflow.com/questions/5377118/how-to-convert-txt-file-to-hadoops-sequence-file-format write a simple utility that converts text files to sequence files and compresses them with Snappy.  Or... am I overthinking this and there is a far easier way to do this?

Hadoop in the Small

Of course... want, no NEED, to build a Hadoop cluster with Raspberry PI devices as seen in the following urls:

Maybe could do it with Java on the BeagleBoard?  Maybe just post a very straight forward post like http://java.dzone.com/articles/getting-hadoop-and-running.

Bureau of Labor Statistics Example

How about a project using the BLS OES datasets?

Integration with MongoDB

Investigate MongoDB Connector for Hadoop as called out at http://www.mongodb.com/press/integration-hadoop-and-mongodb-big-data%E2%80%99s-two-most-popular-technologies-gets-significant.