question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Full Support for Kerberos secured Hadoop Cluster

See original GitHub issue

Hello everyone, Recently I tried to set up petastorm on my company’s hadoop cluster. However as the cluster uses Kerberos for authentication using petastorm failed. I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.

I hacked “petastorm/petastorm/hdfs/namenode.py” line 250 and replaced it with

driver = 'libhdfs'
return pyarrow.hdfs.connect("cluserurl", clusterport, user=usernamer, kerb_ticket=ticket_cache_path, driver=driver)

This setup actually made it possible to work on a kerberos secured cluster. For future releases a better support in petastorm for kerberos would be great as the depending pyarrow perfectly works fine with kerberos.

I also used libhdfs instead of libhdfs3 because libhdfs3 doesn’t work for me.

Kind regards, Jonathan

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:11

github_iconTop GitHub Comments

1reaction
selitvincommented, Jan 2, 2019

Jonathan,

Thanks for the suggestion. Let’s make get the kerberos authentication supported then.

0reactions
cupdikecommented, Jan 15, 2020

I was able to get past this error. Seems like it was actually caused by the file supplied to the kerbTicketCachePath not actually being at the specified location. I was using the --files option with spark-submit when it should have been --py-files.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Hadoop 3.3.4 – Hadoop in Secure Mode
This document describes how to configure authentication for Hadoop in secure mode. When Hadoop is configured to run in secure mode, each Hadoop...
Read more >
How to Configure Clusters to Use Kerberos for Authentication
Go to the HBase Service > Configuration tab and click View and Edit. In the Search field, type HBase Secure to show the...
Read more >
Cloudera secure Hadoop cluster support (Kerberos)
A secure Hadoop cluster is a cluster that relies on Kerberos for user and services (such as HiveServer, HDFS) authentication/authorization.
Read more >
Hadoop and Kerberos: The Madness Beyond the Gate
Kerberos allows different realms to have some form of trust of others. This would allow a Hadoop cluster with its own KDC and...
Read more >
Set up for a Kerberos-enabled Hadoop cluster
Configuration by Hadoop Distribution · Configure Hadoop Authentication · Set up for a Kerberos-enabled Hadoop cluster · Enable HttpFS · Enable ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found