Ephemeral Issue
NOTE: This is a corner-case blog post and really only useful for those who find this entry from a very specific Google search!! Additionally, I'm filing a support case and expect the problem to be resolved long before it becomes a big issue for many. But... there are those of us on the latest-greatest version and get to find this stuff out. ;-)
The Problem (in a nutshell)
Storm in HDP 2.5.3 brings in version 1.6.6 of org.slf4j:log4j-over-slf4j instead of the required version 1.7.21.
The Backdrop
I'm running a Storm topology that uses storm-kafka and I run into the following error when running on HDP 2.5.3.
8487 [Thread-22-kafka-spout-executor[5 5]] ERROR o.a.s.util - Async loop died! java.lang.NoClassDefFoundError: Could not initialize class org.apache.log4j.Log4jLoggerFactory at org.apache.log4j.Logger.getLogger(Logger.java:39) ~[log4j-over-slf4j-1.6.6.jar:1.6.6] at kafka.utils.Logging$class.logger(Logging.scala:24) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.consumer.SimpleConsumer.logger$lzycompute(SimpleConsumer.scala:38) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.consumer.SimpleConsumer.logger(SimpleConsumer.scala:38) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.utils.Logging$class.info(Logging.scala:75) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.consumer.SimpleConsumer.info(SimpleConsumer.scala:38) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:110) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:99) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:165) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:86) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:81) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:71) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at org.apache.storm.kafka.PartitionManager.<init>(PartitionManager.java:103) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:98) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at org.apache.storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:69) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at org.apache.storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:130) ~[cb1921bc05d611e7b83302d162ad3913.jar:?] at org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
A fellow Hortonworker, Ambud Sharma, helped me diagnosis this (ok... HE diagnosed this and I watched the master walk the maven dependency graph!) and it seems that ultimately I need version 1.7.21 (not the 1.6.6 ver identified in the stack trace above) of org.slf4j:log4j-over-slf4j which I tried to include in my project by adding the following to my pom.xml dependencies.
<dependency> <groupId>org.slf4j</groupId> <artifactId>log4j-over-slf4j</artifactId> <version>1.7.21</version> <scope>compile</scope> </dependency>
Unfortunately, we determined that the storm-core module for HDP 2.5.3 includes 1.6.6 of this jar as seen in the (modified) ls output below.
[root@ip-172-xxx-xxx-95 ~]# ls /usr/hdp/current/storm-supervisor/lib/ log4j-over-slf4j-1.6.6.jar ... deleted everything else for brevity ...
I replaced this jar with log4j-over-slf4j-1.7.21.jar and my topology runs fine.
So you get your ops guy to stand up a "secure" (aka Kerberos-enabled) Hadoop cluster and then you try to create a table in the shell. Low and behold, you then get slammed with an AccessDeniedException
like shown below.
[student2@ip-172-30-0-42 ~]$ kinit Password for student2@LAB.HORTONWORKS.NET: [student2@ip-172-30-0-42 ~]$ klist Ticket cache: FILE:/tmp/krb5cc_432201241 Default principal: student2@LAB.HORTONWORKS.NET Valid starting Expires Service principal 03/04/2017 23:28:58 03/05/2017 09:28:58 krbtgt/LAB.HORTONWORKS.NET@LAB.HORTONWORKS.NET renew until 03/11/2017 23:28:53 [student2@ip-172-30-0-42 ~]$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016 hbase(main):001:0> whoami student2@LAB.HORTONWORKS.NET (auth:KERBEROS) groups: domain_users hbase(main):003:0> create 's2test1', 'cf1' ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=student2@LAB.HORTONWORKS.NET, scope=default, params=[namespace=default,table=default:s2test1,family=cf1],action=CREATE) hbase(main):004:0>
Of course you do – you wanted a SECURE cluster; didn't ya?!?! LOL! So, how do you get past this? Well, if you have Apache Ranger installed it gets a heck of a lot easier, but I don't. No biggy, still pretty easy.
First, you need to become the hbase
user and build a valid Kerberos ticket by doing something like the following.
[root@ip-172-30-0-42 ~]# su - hbase Last login: Sun Mar 5 00:08:19 UTC 2017 on pts/3 [hbase@ip-172-30-0-42 ~]$ klist -ket /etc/security/keytabs/hbase.headless.keytab Keytab name: FILE:/etc/security/keytabs/hbase.headless.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (aes128-cts-hmac-sha1-96) 0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (des3-cbc-sha1) 0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (arcfour-hmac) 0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (aes256-cts-hmac-sha1-96) 0 12/16/2016 19:00:02 hbase-telus_training@LAB.HORTONWORKS.NET (des-cbc-md5) [hbase@ip-172-30-0-42 ~]$ kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase-telus_training [hbase@ip-172-30-0-42 ~]$ klist Ticket cache: FILE:/tmp/krb5cc_1017 Default principal: hbase-telus_training@LAB.HORTONWORKS.NET Valid starting Expires Service principal 03/05/2017 00:10:17 03/05/2017 10:10:17 krbtgt/LAB.HORTONWORKS.NET@LAB.HORTONWORKS.NET renew until 03/12/2017 00:10:17 [hbase@ip-172-30-0-42 ~]$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016 hbase(main):001:0> whoami hbase-telus_training@LAB.HORTONWORKS.NET (auth:KERBEROS) groups: hadoop hbase(main):002:0>
Now you just need to give as much permissions out as you feel comfortable. For my use case I want to create unique namespaces for each of my student accounts and then give each account admin-level rights to their namespace so they can create and manage tables in their own bubble. Here's an example of doing that for just one account; student2
.
hbase(main):007:0> create_namespace 's2' 0 row(s) in 0.0240 seconds hbase(main):008:0> grant 'student2', 'RWXC', '@s2' 0 row(s) in 0.1380 seconds hbase(main):007:0>
Now, we just need to get logged back in a student2
and verify a simple table add, put, and scan can be done.
hbase(main):009:0> exit [hbase@ip-172-30-0-42 ~]$ exit logout [root@ip-172-30-0-42 ~]# kdestroy kdestroy: No credentials cache found while destroying cache [root@ip-172-30-0-42 ~]# su - student2 Last login: Sat Mar 4 23:28:43 UTC 2017 on pts/3 [student2@ip-172-30-0-42 ~]$ kinit Password for student2@LAB.HORTONWORKS.NET: [student2@ip-172-30-0-42 ~]$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.1.2.2.5.3.0-37, rcb8c969d1089f1a34e9df11b6eeb96e69bcf878d, Tue Nov 29 18:48:22 UTC 2016 hbase(main):001:0> create 's2:test_table', 'fam' 0 row(s) in 1.4790 seconds => Hbase::Table - s2:test_table hbase(main):002:0> put 's2:test_table', '1', 'fam:name', 'Lester' 0 row(s) in 0.1640 seconds hbase(main):003:0> scan 's2:test_table' ROW COLUMN+CELL 1 column=fam:name, timestamp=1488673828673, value=Lester 1 row(s) in 0.0230 seconds hbase(main):004:0>
Viola!!
There I was on an AWS hosted node trying to access port 2181 and 9092 on another AWS node where I just followed the instructions at http://kafka.apache.org/documentation/#quickstart to get a stand-alone instance of Kafka running. After some exceptions that suggested I could not reach these ports, I fell back to trusty old telnet to verify that was problem.
[root@ip-172-xxx-xxx-86 kafka]# telnet kafka 22 Trying 172.xxx.xxx.45... Connected to kafka. Escape character is '^]'. SSH-2.0-OpenSSH_6.6.1 ^C Connection closed by foreign host. [root@ip-172-xxx-xxx-86 kafka]# [root@ip-172-xxx-xxx-86 kafka]# telnet kafka 2181 Trying 172.xxx.xxx.45... telnet: connect to address 172.xxx.xxx.45: No route to host [root@ip-172-xxx-xxx-86 kafka]#
Yep, "no route to host"!! After double-checking that these two AWS hosts should be able to talk to each other on these ports it seemed logical that the box's firewall might be "helping" me out. I quickly found out that with CentOS 7, we've moved away from iptables and to firewalld. Thankfully, man firewall-cmd
was there to help me out. I just needed to add these two ports and the following commands show what the configuration looked like before and after I made these mods.
[root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: eth0 sources: services: dhcpv6-client ssh vnc-server ports: protocols: masquerade: no forward-ports: sourceports: icmp-blocks: rich rules: [root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --permanent --zone=public --add-port=2181/tcp success [root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --permanent --zone=public --add-port=9092/tcp success [root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --reload success [root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]# firewall-cmd --info-zone=public public (active) target: default icmp-block-inversion: no interfaces: eth0 sources: services: dhcpv6-client ssh vnc-server ports: 2181/tcp 9092/tcp protocols: masquerade: no forward-ports: sourceports: icmp-blocks: rich rules: [root@ip-172-xxx-xxx-45 kafka_2.11-0.10.1.0]#
Then I was "cooking with bacon" as telnet shows!
[root@ip-172-xxx-xxx-86 ~]# telnet kafka 2181 Trying 172.xxx.xxx.45... Connected to kafka. Escape character is '^]'. ^CConnection closed by foreign host. [root@ip-172-xxx-xxx-86 ~]# telnet kafka 9092 Trying 172.xxx.xxx.45... Connected to kafka. Escape character is '^]'. ^CConnection closed by foreign host. [root@ip-172-xxx-xxx-86 ~]#
More importantly, I could not connect to this novel Kafka configuration I was using for some testing.
[root@ip-172-xxx-xxx-86 kafka]# bin/kafka-console-consumer.sh --zookeeper kafka:2181 --topic lestertester --from-beginning Now is the time for all good men to come to the aid of their country. Now is the time for all good men to come to the aid of their country. Now is the time for all good men to come to the aid of their country. ^CProcessed a total of 3 messages [root@ip-172-xxx-xxx-86 kafka]#
Good luck on your use of CentOS 7's firewalld and don't forget to consult your friendly man page as needed.