Blog from March, 2017

Ephemeral Issue

NOTE: This is a corner-case blog post and really only useful for those who find this entry from a very specific Google search!!  Additionally, I'm filing a support case and expect the problem to be resolved long before it becomes a big issue for many.  But... there are those of us on the latest-greatest version and get to find this stuff out.  ;-)

The Problem (in a nutshell)

Storm in HDP 2.5.3 brings in version 1.6.6 of org.slf4j:log4j-over-slf4j instead of the required version 1.7.21.

The Backdrop

I'm running a Storm topology that uses storm-kafka and I run into the following error when running on HDP 2.5.3.

8487 [Thread-22-kafka-spout-executor[5 5]] ERROR o.a.s.util - Async loop died!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.log4j.Log4jLoggerFactory
    at org.apache.log4j.Logger.getLogger(Logger.java:39) ~[log4j-over-slf4j-1.6.6.jar:1.6.6]
    at kafka.utils.Logging$class.logger(Logging.scala:24) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at kafka.consumer.SimpleConsumer.logger$lzycompute(SimpleConsumer.scala:38) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at kafka.consumer.SimpleConsumer.logger(SimpleConsumer.scala:38) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at kafka.utils.Logging$class.info(Logging.scala:75) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at kafka.consumer.SimpleConsumer.info(SimpleConsumer.scala:38) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:110) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]  
    at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:99) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:165) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:86) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:81) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:71) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at org.apache.storm.kafka.PartitionManager.<init>(PartitionManager.java:103) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:98) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]  
    at org.apache.storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:69) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at org.apache.storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:130) ~[cb1921bc05d611e7b83302d162ad3913.jar:?]
    at org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

A fellow Hortonworker, Ambud Sharma, helped me diagnosis this (ok... HE diagnosed this and I watched the master walk the maven dependency graph!) and it seems that ultimately I need version 1.7.21 (not the 1.6.6 ver identified in the stack trace above) of org.slf4j:log4j-over-slf4j which I tried to include in my project by adding the following to my pom.xml dependencies.

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>log4j-over-slf4j</artifactId>
            <version>1.7.21</version>
            <scope>compile</scope>
        </dependency>

Unfortunately, we determined that the storm-core module for HDP 2.5.3 includes 1.6.6 of this jar as seen in the (modified) ls output below.

[root@ip-172-xxx-xxx-95 ~]# ls /usr/hdp/current/storm-supervisor/lib/
log4j-over-slf4j-1.6.6.jar     ... deleted everything else for brevity ...

I replaced this jar with log4j-over-slf4j-1.7.21.jar and my topology runs fine.

So you get your ops guy to stand up a "secure" (aka Kerberos-enabled) Hadoop cluster and then you try to create a table in the shell.  Low and behold, you then get slammed with an AccessDeniedException like shown below.

[student2@ip-172-30-0-42 ~]$ kinit
Password for student2@LAB.HORTONWORKS.NET: 
[student2@ip-172-30-0-42 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_432201241
Default principal: student2@LAB.HORTONWORKS.NET

Valid starting       Expires              Service principal
03/04/2017 23:28:58  03/05/2017 09:28:58  krbtgt/LAB.HORTONWORKS.NET@LAB.HORTONWORKS.NET
    renew until 03/11/2017 23:28:53
[student2@ip-172-30-0-42 ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016

hbase(main):001:0> whoami
student2@LAB.HORTONWORKS.NET (auth:KERBEROS)
    groups: domain_users

hbase(main):003:0> create 's2test1', 'cf1'

ERROR:
 org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
permissions (user=student2@LAB.HORTONWORKS.NET, scope=default, 
params=[namespace=default,table=default:s2test1,family=cf1],action=CREATE)

hbase(main):004:0>

Of course you do – you wanted a SECURE cluster; didn't ya?!?!  LOL!  So, how do you get past this?  Well, if you have Apache Ranger installed it gets a heck of a lot easier, but I don't.  No biggy, still pretty easy.

First, you need to become the hbase user and build a valid Kerberos ticket by doing something like the following.

[root@ip-172-30-0-42 ~]# su - hbase
Last login: Sun Mar  5 00:08:19 UTC 2017 on pts/3
[hbase@ip-172-30-0-42 ~]$ klist -ket /etc/security/keytabs/hbase.headless.keytab
Keytab name: FILE:/etc/security/keytabs/hbase.headless.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (aes128-cts-hmac-sha1-96) 
   0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (des3-cbc-sha1) 
   0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (arcfour-hmac) 
   0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (aes256-cts-hmac-sha1-96) 
   0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (des-cbc-md5) 
[hbase@ip-172-30-0-42 ~]$ kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase-telus_training
[hbase@ip-172-30-0-42 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_1017
Default principal: hbase-telus_training@LAB.HORTONWORKS.NET

Valid starting       Expires              Service principal
03/05/2017 00:10:17  03/05/2017 10:10:17  krbtgt/LAB.HORTONWORKS.NET@LAB.HORTONWORKS.NET
    renew until 03/12/2017 00:10:17
[hbase@ip-172-30-0-42 ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016

hbase(main):001:0> whoami
hbase-telus_training@LAB.HORTONWORKS.NET (auth:KERBEROS)
    groups: hadoop

hbase(main):002:0> 

Now you just need to give as much permissions out as you feel comfortable.  For my use case I want to create unique namespaces for each of my student accounts and then give each account admin-level rights to their namespace so they can create and manage tables in their own bubble.  Here's an example of doing that for just one account; student2.

hbase(main):007:0> create_namespace 's2'
0 row(s) in 0.0240 seconds

hbase(main):008:0> grant 'student2', 'RWXC', '@s2'
0 row(s) in 0.1380 seconds

hbase(main):007:0> 

Now, we just need to get logged back in a student2 and verify a simple table add, put, and scan can be done.

hbase(main):009:0> exit
[hbase@ip-172-30-0-42 ~]$ exit
logout
[root@ip-172-30-0-42 ~]# kdestroy
kdestroy: No credentials cache found while destroying cache
[root@ip-172-30-0-42 ~]# su - student2
Last login: Sat Mar  4 23:28:43 UTC 2017 on pts/3
[student2@ip-172-30-0-42 ~]$ kinit
Password for student2@LAB.HORTONWORKS.NET: 
[student2@ip-172-30-0-42 ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016

hbase(main):001:0> create 's2:test_table', 'fam'
0 row(s) in 1.4790 seconds

=> Hbase::Table - s2:test_table
hbase(main):002:0> put 's2:test_table', '1', 'fam:name', 'Lester'
0 row(s) in 0.1640 seconds

hbase(main):003:0> scan 's2:test_table'
ROW                      COLUMN+CELL                                                         
 1                       column=fam:name, timestamp=1488673828673, value=Lester              
1 row(s) in 0.0230 seconds

hbase(main):004:0>

Viola!!

There I was on an AWS hosted node trying to access port 2181 and 9092 on another AWS node where I just followed the instructions at http://kafka.apache.org/documentation/#quickstart to get a stand-alone instance of Kafka running.  After some exceptions that suggested I could not reach these ports, I fell back to trusty old telnet to verify that was problem.

[root@ip-172-xxx-xxx-86 kafka]# telnet kafka 22
Trying 172.xxx.xxx.45...
Connected to kafka.
Escape character is '^]'.
SSH-2.0-OpenSSH_6.6.1
^C
Connection closed by foreign host.
[root@ip-172-xxx-xxx-86 kafka]# 
[root@ip-172-xxx-xxx-86 kafka]# telnet kafka 2181
Trying 172.xxx.xxx.45...
telnet: connect to address 172.xxx.xxx.45: No route to host
[root@ip-172-xxx-xxx-86 kafka]# 

Yep, "no route to host"!!  After double-checking that these two AWS hosts should be able to talk to each other on these ports it seemed logical that the box's firewall might be "helping" me out.  I quickly found out that with CentOS 7, we've moved away from iptables and to firewalld.  Thankfully, man firewall-cmd was there to help me out.  I just needed to add these two ports and the following commands show what the configuration looked like before and after I made these mods.

[root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --info-zone=public
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh vnc-server
  ports: 
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules: 


[root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --permanent --zone=public --add-port=2181/tcp
success
[root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --permanent --zone=public --add-port=9092/tcp
success
[root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --reload
success
[root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --info-zone=public
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh vnc-server
  ports: 2181/tcp 9092/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules: 

[root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# 

Then I was "cooking with bacon" as telnet shows!

[root@ip-172-xxx-xxx-86 ~]# telnet kafka 2181
Trying 172.xxx.xxx.45...
Connected to kafka.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@ip-172-xxx-xxx-86 ~]# telnet kafka 9092
Trying 172.xxx.xxx.45...
Connected to kafka.
Escape character is '^]'.
^CConnection closed by foreign host.
[root@ip-172-xxx-xxx-86 ~]# 

More importantly, I could not connect to this novel Kafka configuration I was using for some testing.

[root@ip-172-xxx-xxx-86 kafka]# bin/kafka-console-consumer.sh --zookeeper kafka:2181 --topic lestertester --from-beginning
Now is the time for all good men to come to the aid of their country.
 Now is the time for all good men to come to the aid of their country.
  Now is the time for all good men to come to the aid of their country.
^CProcessed a total of 3 messages
[root@ip-172-xxx-xxx-86 kafka]# 

Good luck on your use of CentOS 7's firewalld and don't forget to consult your friendly man page as needed.