Getting error when trying to write .txt file from local fs to webhdfs.
See original GitHub issueGetting the following error when trying to write a text file from local to webhdfs:
TypeError: can only concatenate str (not "bytes") to str
Hoping that I am just doing something wrong. My code is as follows:
from smart_open import open
def smart_copy(source_file, sync_file):
with open(source_file, 'rb') as source:
with open(sync_file, 'wb') as sync:
for line in source:
sync.write(line)
smart_copy('./test_file.txt', 'webhdfs://{username}@{host}:{port}/user/XXXX/smart_copy/test_file.txt')
The stack trace for the error is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-b63bd711d13c> in <module>
----> 1 smart_copy('./test_file.txt', 'webhdfs://XXXX@XXX.XXX.XXX.XXX:XXXXX/user/XXXX/smart_copy/test_file.txt')
<ipython-input-38-694f70cf0776> in smart_copy(source_file, sync_file)
3 '''
4 with open(source_file, 'rb') as source:
----> 5 with open(sync_file, 'wb') as sync:
6 for line in source:
7 sync.write(line)
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in open(uri, mode, buffering, encoding, errors, newline, closefd, opener, ignore_ext, transport_params)
346 except KeyError:
347 binary_mode = mode
--> 348 binary, filename = _open_binary_stream(uri, binary_mode, transport_params)
349 if ignore_ext:
350 decompressed = binary
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py in _open_binary_stream(uri, mode, transport_params)
560 elif parsed_uri.scheme == "webhdfs":
561 kw = _check_kwargs(smart_open_webhdfs.open, transport_params)
--> 562 return smart_open_webhdfs.open(parsed_uri.uri_path, mode, **kw), filename
563 elif parsed_uri.scheme.startswith('http'):
564 #
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in open(uri, mode, min_part_size)
40 return BufferedInputBase(uri)
41 elif mode == 'wb':
---> 42 return BufferedOutputBase(uri, min_part_size=min_part_size)
43 else:
44 raise NotImplementedError('webhdfs support for mode %r not implemented' % mode)
~/Github/tools/.venv/lib/python3.7/site-packages/smart_open/webhdfs.py in __init__(self, uri_path, min_part_size)
129 params=payload, allow_redirects=False)
130 if not init_response.status_code == httplib.TEMPORARY_REDIRECT:
--> 131 raise WebHdfsException(str(init_response.status_code) + "\n" + init_response.content)
132 uri = init_response.headers['location']
133 response = requests.put(uri, data="", headers={'content-type': 'application/octet-stream'})
TypeError: can only concatenate str (not "bytes") to str
test_file.txt is just an ascii text file. Using python 3.7.
Any guidance that you could provide would be awesome. End use case is copying files from s3 to webHDFS and back again.
Thanks!!!
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (5 by maintainers)
Top Results From Across the Web
Connection timeout error while reading a file from HDFS using ...
client.read('/opt/hadoop/LICENSE.txt'). You're running HDFS in pseudo distributed mode, but you're trying to read a local file.
Read more >Apache Hadoop 3.3.4 – WebHDFS REST API
The local-filesystem location of the trust-store file, containing the certificate for the NameNode. ssl.client.truststore.type, (Optional) The ...
Read more >Import data from remote server to HDFS - Cloudera Community
I have csv data in remote server and i need to import that data to HDFS . please suggest what are the options...
Read more >4. Working with the Hadoop File System - Spring
webhdfs :// is one of the additions in Hadoop 1.0 and is a mixture between hdfs and hftp protocol - it provides a...
Read more >Properties reference: File connector - IBM
Select the file system to read files from or write files to. Type: selection; Default: Local; Values: Local; WebHDFS; HttpFS; NativeHDFS. Use custom...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
That’s a good question. I’m guessing the tests pass because it still raises an exception, just not the informative one that was intended. I will have to look first to verify and then I’ll submit a PR. Thanks for looking at this!
Looking at this further, it looks like
init_response.content
on line 131 of webhdfs.py is returning a byte string instead of a string which is causing the error. I changed this to ‘init_response.text’ and now I am seeing the intended error message. Happy to open a PR to address this issue. Looks like it may occur again on 135 of the same module.