Connecting to a secure Impala via NiFi with TLS and Kerberos (and Hive2)
Note: Please be aware that this JDBC driver is NOT fully supported by NiFi, thus there is no guarentee that more complex features/behavious will work properly (or at all). It’s a good work around for simple use cases, but it should not be relied upon heavily.
There’s a few different ways this could be done. I’ll demonstrate one possible way, using ExecuteSQL to connect to Impala via the JDBC driver.
This assumes both Kerberos and TLS are in use. We are also using an internal PKI, so we have to provide custom CACerts via a truststore.
Get the Cloudera JDBC Driver from here.
Unzip it, and find the
Move this jar to your NiFi extensions dir, by default in
/var/lib/nifi/extensions. You need to put it on all NiFi nodes.
Now, the file needs to have the appropriate permissions.
chown nifi:nifi /var/lib/nifi/extensions/ImpalaJDBC4.jar chmod 770 /var/lib/nifi/extensions/ImpalaJDBC4.jar
Add an ExecuteSQL processor to your flow. In the drop down for the connection pool, add a new service. Select a DBConnectionPool service and give it a name.
For the driver location, point it towards the new jar.
For the driver class name, you can find the options here. We’ll use the following:
Now for the connection string. This will depend heavily on your environment. In this case, I am using keytabs, so I will provide a KeytabCredentialService later on.
jdbc:impala://<IMPALAD COORDINATOR>:21050;AuthMech=1; KrbRealm=<REALM>;KrbHostFQDN=<IMPALAD COORDINATOR>; KrbServiceName=impala;SSL=1;SSLKeyStore=<PATH TO KEYSTORE>/keystore.jks;SSLKeyStorePwd=<KEYSTOREPW>;SSLTrustStore=<PATH TO TRUSTSTORE>/truststore.jks
This is all that’s needed for the DBConnectionPool. Save this and go back to the ExecuteSQL processor.
In the Kerberos Credentials Service, add a new KeytabCredentialService. Provide a valid principal and keytab file - ensure the principal actually has permissions in Impala.
Add a simple query in the select query field and give it a test.
FYI: The same technique works for Hive2, which is handy because NiFi does not work with Hive2. An odd decision there. Anyway. Obtain the Hive JDBCs from here. With some slight changes to the conection string, it’s otherwise exactly the same.
jdbc:hive2://<HS2 SERVER>:21050;AuthMech=1; KrbRealm=<REALM>;KrbHostFQDN=<HS2 SERVER>; KrbServiceName=hive;SSL=1;SSLKeyStore=<PATH TO KEYSTORE>/keystore.jks;SSLKeyStorePwd=<KEYSTOREPW>;SSLTrustStore=<PATH TO TRUSTSTORE>/truststore.jks
apache cloudera hadoop impala kerberos nifi