question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting error when trying to write .txt file from local fs to webhdfs.

See original GitHub issue

Getting the following error when trying to write a text file from local to webhdfs: TypeError: can only concatenate str (not "bytes") to str

Hoping that I am just doing something wrong. My code is as follows:

from smart_open import open

def smart_copy(source_file, sync_file):
    with open(source_file, 'rb') as source:
        with open(sync_file, 'wb') as sync:
            for line in source:
                sync.write(line)

smart_copy('./test_file.txt', 'webhdfs://{username}@{host}:{port}/user/XXXX/smart_copy/test_file.txt')

The stack trace for the error is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-b63bd711d13c> in <module>
----> 1 smart_copy('./test_file.txt', 'webhdfs://XXXX@XXX.XXX.XXX.XXX:XXXXX/user/XXXX/smart_copy/test_file.txt')

<ipython-input-38-694f70cf0776> in smart_copy(source_file, sync_file)
      3     '''
      4     with open(source_file, 'rb') as source:
----> 5         with open(sync_file, 'wb') as sync:
      6             for line in source:
      7                 sync.write(line)

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in open(uri, mode, buffering, encoding, errors, newline, closefd, opener, ignore_ext, transport_params)
    346     except KeyError:
    347         binary_mode = mode
--> 348     binary, filename = _open_binary_stream(uri, binary_mode, transport_params)
    349     if ignore_ext:
    350         decompressed = binary

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in _open_binary_stream(uri, mode, transport_params)
    560         elif parsed_uri.scheme == "webhdfs":
    561             kw = _check_kwargs(smart_open_webhdfs.open, transport_params)
--> 562             return smart_open_webhdfs.open(parsed_uri.uri_path, mode, **kw), filename
    563         elif parsed_uri.scheme.startswith('http'):
    564             #

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in open(uri, mode, min_part_size)
     40         return BufferedInputBase(uri)
     41     elif mode == 'wb':
---> 42         return BufferedOutputBase(uri, min_part_size=min_part_size)
     43     else:
     44         raise NotImplementedError('webhdfs support for mode %r not implemented' % mode)

~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in __init__(self, uri_path, min_part_size)
    129                                      params=payload, allow_redirects=False)
    130         if not init_response.status_code == httplib.TEMPORARY_REDIRECT:
--> 131             raise WebHdfsException(str(init_response.status_code) + "\n" + init_response.content)
    132         uri = init_response.headers['location']
    133         response = requests.put(uri, data="", headers={'content-type': 'application/octet-stream'})

TypeError: can only concatenate str (not "bytes") to str

test_file.txt is just an ascii text file. Using python 3.7.

Any guidance that you could provide would be awesome. End use case is copying files from s3 to webHDFS and back again.

Thanks!!!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
maddogdatacommented, Jul 20, 2019

That’s a good question. I’m guessing the tests pass because it still raises an exception, just not the informative one that was intended. I will have to look first to verify and then I’ll submit a PR. Thanks for looking at this!

1reaction
maddogdatacommented, Jul 20, 2019

Looking at this further, it looks like init_response.content on line 131 of webhdfs.py is returning a byte string instead of a string which is causing the error. I changed this to ‘init_response.text’ and now I am seeing the intended error message. Happy to open a PR to address this issue. Looks like it may occur again on 135 of the same module.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Connection timeout error while reading a file from HDFS using ...
client.read('/opt/hadoop/LICENSE.txt'). You're running HDFS in pseudo distributed mode, but you're trying to read a local file.
Read more >
Apache Hadoop 3.3.4 – WebHDFS REST API
The local-filesystem location of the trust-store file, containing the certificate for the NameNode. ssl.client.truststore.type, (Optional) The ...
Read more >
Import data from remote server to HDFS - Cloudera Community
I have csv data in remote server and i need to import that data to HDFS . please suggest what are the options...
Read more >
4. Working with the Hadoop File System - Spring
webhdfs :// is one of the additions in Hadoop 1.0 and is a mixture between hdfs and hftp protocol - it provides a...
Read more >
Properties reference: File connector - IBM
Select the file system to read files from or write files to. Type: selection; Default: Local; Values: Local; WebHDFS; HttpFS; NativeHDFS. Use custom...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found