create and share a hive udf (the cli is your friend)

As a follow-up to create and share a pig udf (anyone can do it) I thought I'd post a similarly focused write-up on how you can put your custom Hive UDF jars on HDFS to let all users utilize the functions you create.  As detailed in HIVE-6380, if you are already on Hive 0.13 (HDP 2.1) then notice the one-liner way to do all of this at the bottom of that last Hive wiki link.  As for me, I'm using the Hortonworks Sandbox version 2.0 again, so I have to do it with a whopping two lines as you'll see below. Also, as with the Pig UDF tutorial I can't get the example to run successfully with Hue's Hive UI.  So... we'll stick with the CLI for this quick blog post.

First up, make sure you have a Hive table that you can run a SELECT statement on.  I'll be using the employees table I created in how do i load a fixed-width formatted file into hive? (with a little help from pig).

Then we'll need a UDF itself.  As there are good tutorials on this already on the web I'm just going to use the "to upper" function described in http://gethue.com/hadoop-tutorial-hive-udf-in-1-minute/.  I pulled down the precompiled jar instead of building it – grab it yourself; myudfs.jar.  I put it in the  /user/hue/shared/hive/udfs HDFS directory.  If you crack that open you'll find the source for the actual UDF.

org.hue.udf.MyUpper
package org.hue.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class MyUpper extends UDF {
  public Text evaluate(final Text s) {
    if (s == null) { return null; }
    return new Text(s.toString().toUpperCase() + "-gg");
  }
}

Yes... he's adding an extra -gg at the end, but that will make it that much easier to notice all is working fine.  Now we just need a Hive query to declare this function and then use it.

SelectWithUDFonHDFS.hql
ADD JAR hdfs:///user/hue/shared/hive/udfs/myudfs.jar;
CREATE TEMPORARY FUNCTION my_upper AS 'org.hue.udf.MyUpper';

SELECT first_name, my_upper(last_name) FROM employees;

Just like with the example in create and share a pig udf (anyone can do it), Hue does not seem to like this and errors out very quickly.  Again, bonus points for running this one down.  So, we fall back on our trusty CLI to execute the script.

[hue@sandbox ~]$ hive -f SelectWithUDFonHDFS.hql 

... LOTS of lines removed ...

OK
1234567890123456789    1234567890123456789-gg
FIRST-NAME    LASTNAME-gg
Johnny    BEGOOD-gg
Ainta    LISTENING-gg
Neva    MIND-gg
Joseph    BLOW-gg
Sallie    MAE-gg
Bilbo    BAGGINS-gg
Nuther    ONE-gg
Yeta    NOTHERONE-gg
Evenmore    DUMBNAMES-gg
Last    SILLYNAME-gg
Time taken: 20.264 seconds, Fetched: 12 row(s)

As with the Pig UDF, it worked; this time with Hive!  You did it!!  Everything was CAPITALIZED (and got an extra trailing bit of nonsense at no extra charge)!!!  Congratulations!!!!