Full Support for Kerberos secured Hadoop Cluster
See original GitHub issueHello everyone, Recently I tried to set up petastorm on my company’s hadoop cluster. However as the cluster uses Kerberos for authentication using petastorm failed. I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.
I hacked “petastorm/petastorm/hdfs/namenode.py” line 250 and replaced it with
driver = 'libhdfs'
return pyarrow.hdfs.connect("cluserurl", clusterport, user=usernamer, kerb_ticket=ticket_cache_path, driver=driver)
This setup actually made it possible to work on a kerberos secured cluster. For future releases a better support in petastorm for kerberos would be great as the depending pyarrow perfectly works fine with kerberos.
I also used libhdfs instead of libhdfs3 because libhdfs3 doesn’t work for me.
Kind regards, Jonathan
Issue Analytics
- State:
- Created 5 years ago
- Comments:11
Top Results From Across the Web
Apache Hadoop 3.3.4 – Hadoop in Secure Mode
This document describes how to configure authentication for Hadoop in secure mode. When Hadoop is configured to run in secure mode, each Hadoop...
Read more >How to Configure Clusters to Use Kerberos for Authentication
Go to the HBase Service > Configuration tab and click View and Edit. In the Search field, type HBase Secure to show the...
Read more >Cloudera secure Hadoop cluster support (Kerberos)
A secure Hadoop cluster is a cluster that relies on Kerberos for user and services (such as HiveServer, HDFS) authentication/authorization.
Read more >Hadoop and Kerberos: The Madness Beyond the Gate
Kerberos allows different realms to have some form of trust of others. This would allow a Hadoop cluster with its own KDC and...
Read more >Set up for a Kerberos-enabled Hadoop cluster
Configuration by Hadoop Distribution · Configure Hadoop Authentication · Set up for a Kerberos-enabled Hadoop cluster · Enable HttpFS · Enable ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Jonathan,
Thanks for the suggestion. Let’s make get the kerberos authentication supported then.
I was able to get past this error. Seems like it was actually caused by the file supplied to the kerbTicketCachePath not actually being at the specified location. I was using the --files option with spark-submit when it should have been --py-files.