Feature to read csv from hdfs:// URL
See original GitHub issueWhen running pandas in AWS, The following works perfectly fine:
pd.read_csv("s3://mybucket/data.csv")
But running the following, does not:
pd.read_csv("hdfs:///tmp/data.csv")
It would be a good user experience to allow for the hdfs:// schema too similar to how http, ftp, s3, and file are valid schemas right now.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:20 (13 by maintainers)
Top Results From Across the Web
How to read a CSV file from HDFS using PySpark - ProjectPro
This recipe helps you read a CSV file from HDFS using PySpark. ... Read the CSV file into a dataframe using the function...
Read more >Spark Read Files from HDFS (TXT, CSV, AVRO, PARQUET ...
Use textFile() and wholeTextFiles() method of the SparkContext to read files from any file system and to read from HDFS, you need to...
Read more >Load file csv from hdfs - Stack Overflow
Create a Destination HDFS directory first. It looks like /user/hduser directory is not present in HDFS. hdfs dfs -mkdir -p /user/hduser.
Read more >[Hadoop HDFS]Read CSV Data - HULFT
Reads CVS format file from HDFS. Data Model. Data model of this component is Table Model type. Properties. For information about using variables,...
Read more >Python & HDFS. Read and write data from HDFS using…
After instantiating the HDFS client, use the write() function to write this Pandas Dataframe into HDFS with CSV format. from hdfs.ext.kerberos ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Note that pyarrow’s HDFS interface will be deprecated sometime. I guess the “legacy” interface will be around a while, but fsspec will need to have its shim rewritten to the newer filesystem that pyarrow makes, when it’s stable. Hopefully, this shouldn’t affect users.
tests can be on a similar manner to here: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/test_gcs.py