Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

I'm taking some creative liberty and assume assuming that you've worked through (or at least understand the concepts of) some/most/all of the Sandbox tutorials.  For example, check out Tutorial 1 to see how to upload this file.  If at any point in this blog posting you are unfamiliar with how to do an activity, check back with the tutorials for help.  If you can't find which one can help you, let me know in the comments section and I'll reply with more info.

...

Now we just need to build a simple converter script.  Plenty There are plenty of good resources out there on Pig including the tutorials; I'm even trying to build up a Pig Cheat Sheet.  For the one you'll see below, I basically stripped down the info found at https://bluewatersql.wordpress.com/2013/04/17/shakin-bacon-using-pig-to-process-data/ (remember, a lazy programmer is a good programmer & imitation is the sincerest form of flattery) which is worth checking out.

...

Code Block
languagetext
REGISTER piggybank.jar;
define SUBSTRING org.apache.pig.piggybank.evaluation.string.SUBSTRING();

fixed_length_input = load '/user/hue/emps-fixed.txt' as (row:chararray);

Then bring into build a structure using the SUBSTRING function to pluck the right values out of each line.  Lastly, just write this new structure into a file on HDFS.

...

Now you need to create a table based on this delimited file and (hint, see Tutorial 1 again for one way to do that) and then run a simple query to verify all works.  I ran the following SQLHiveQL.

Code Block
languagetext
select emp_id, last_name, job_title
  from employees
 where mgr_emp_id = 12306;

...

There you have it – a super simple way to get some a fixed-width files file into a table that you can query from hive.