Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

WORK IN PROGRESS!!!!

 

The Pig project's User Defined Functions gives a pretty good overview of how to create a UDF.  In fact, I stole my simple UDF from there.  For Pig UDF's the "Hello World" program is actually a "Convert to Upper Case" function.  For this effort, I'm using the Hortonworks Sandbox (version 2.0).  Once you have that setup operational, follow along and we'll get your first UDF created and placed on HDFS where others can easily share it. 

...

Now that we've got it created let's share it.  The best way to make it accessible to everyone is to put the jar file on HDFS itself.  Since we are using the Sandbox, we could just use Hue, but everything is always more fun at the command line

Code Block
languagebash
[hue@sandbox ~]$ hadoop fs -mkdir shared
[hue@sandbox ~]$ hadoop fs -mkdir shared/pig
[hue@sandbox ~]$ hadoop fs -mkdir shared/pig/udfs
[hue@sandbox ~]$ ls -l *.jar
-rw-rw-r-- 1 hue hue 1534 Mar 29 00:54 exampleudf.jar
[hue@sandbox ~]$ hadoop fs -put exampleudf.jar shared/pig/udfs/exampleudf.jar
[hue@sandbox ~]$ hadoop fs -ls /user/hue/shared/pig/udfs
Found 1 items
-rw-r--r--   3 hue hue       1534 2014-03-29 00:59 /user/hue/shared/pig/udfs/exampleudf.jar

...

Code Block
languagetext
titletest-UPPER.pig
REGISTER 'hdfs:///user/hue/shared/pig/udfs/exampleudf.jar';
DEFINE SIMPLEUPPER exampleudf.UPPER();

typing_line = LOAD '/user/hue/testData/typingText.txt' AS (row:chararray);

upper_typing_line = FOREACH typing_line GENERATE SIMPLEUPPER(row);

DUMP upper_typing_line;

The logical think thing would be to use the Pig UI component of Hue to run this super simple function, but I simply cannot figure out why it complains with the following error each time.

Code Block
languagetext
2014-03-29 01:15:19,712 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Pathname /tmp/udfs/'hdfs:/user/hue/shared/pig/udfs/exampleudf.jar' from hdfs://sandbox.hortonworks.com:8020/tmp/udfs/'hdfs:/user/hue/shared/pig/udfs/exampleudf.jar' is not a valid DFS filename.

...