Connecting to a secure Impala via NiFi

Note: Please be aware that this JDBC driver is NOT fully supported by NiFi, thus there is no guarentee that more complex features/behavious will work properly (or at all). It’s a good work around for simple use cases, but it should not be relied upon heavily.

There’s a few different ways this could be done. I’ll demonstrate one possible way, using ExecuteSQL to connect to Impala via the JDBC driver.

This assumes both Kerberos and TLS are in use. We are also using an internal PKI, so we have to provide custom CACerts via a truststore.

Get the Cloudera JDBC Driver from here.

Unzip it, and find the _ImpalaJDBC4.jar_.

Move this jar to your NiFi extensions dir, by default in /var/lib/nifi/extensions. You need to put it on all NiFi nodes.

Now, the file needs to have the appropriate permissions.

chown nifi:nifi /var/lib/nifi/extensions/ImpalaJDBC4.jar
chmod 770 /var/lib/nifi/extensions/ImpalaJDBC4.jar

Add an ExecuteSQL processor to your flow. In the drop down for the connection pool, add a new service. Select a DBConnectionPool service and give it a name.

For the driver location, point it towards the new jar.

/var/lib/nifi/extensions/ImpalaJDBC4.jar

For the driver class name, you can find the options here. We’ll use the following:

com.cloudera.impala.jdbc4.Driver

Now for the connection string. This will depend heavily on your environment. In this case, I am using keytabs, so I will provide a KeytabCredentialService later on.

jdbc:impala://<IMPALAD COORDINATOR>:21050;AuthMech=1; KrbRealm=<REALM>;KrbHostFQDN=<IMPALAD COORDINATOR>; KrbServiceName=impala;SSL=1;SSLKeyStore=<PATH TO KEYSTORE>/keystore.jks;SSLKeyStorePwd=<KEYSTOREPW>;SSLTrustStore=<PATH TO TRUSTSTORE>/truststore.jks

This is all that’s needed for the DBConnectionPool. Save this and go back to the ExecuteSQL processor.

In the Kerberos Credentials Service, add a new KeytabCredentialService. Provide a valid principal and keytab file - ensure the principal actually has permissions in Impala.

Add a simple query in the select query field and give it a test.

FYI: The same technique works for Hive2, which is handy because NiFi does not work with Hive2. An odd decision there. Anyway. Obtain the Hive JDBCs from here. With some slight changes to the conection string, it’s otherwise exactly the same.

jdbc:hive2://<HS2 SERVER>:21050;AuthMech=1; KrbRealm=<REALM>;KrbHostFQDN=<HS2 SERVER>; KrbServiceName=hive;SSL=1;SSLKeyStore=<PATH TO KEYSTORE>/keystore.jks;SSLKeyStorePwd=<KEYSTOREPW>;SSLTrustStore=<PATH TO TRUSTSTORE>/truststore.jks