AttributeError: 'bytes' object has no attribute 'close'
See original GitHub issueHi Guys,
Got this error below. I thought it is the server issue, so killed the process running on 9998 (as advised in #166). But it does not work.
FYI, I can run the code smoothly in another virtual env.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-d5e42260332e> in <module>
2 #'http://tika:9998/tika'
3 pdf_file = r'C:\Users\kx764qe\Desktop\checkerlist\4_Working\ACC_code_gds\FS\AUB_Financials_Dec_2018.pdf'
----> 4 data = parser.from_file(pdf_file, xmlContent=True)
~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\tika\parser.py in from_file(filename, serverEndpoint, xmlContent, headers, config_path, requestOptions)
37 else:
38 jsonOutput = parse1('all', filename, serverEndpoint, services={'meta': '/meta', 'text': '/tika', 'all': '/rmeta/xml'},
---> 39 headers=headers, config_path=config_path, requestOptions=requestOptions)
40 return _parse(jsonOutput)
41
~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\tika\tika.py in parse1(option, urlOrPath, serverEndpoint, verbose, tikaServerJar, responseMimeType, services, rawResponse, headers, config_path, requestOptions)
327 headers.update({'Accept': responseMimeType, 'Content-Disposition': make_content_disposition_header(path)})
328 status, response = callServer('put', serverEndpoint, service, open(path, 'rb'),
--> 329 headers, verbose, tikaServerJar, config_path=config_path, rawResponse=rawResponse, requestOptions=requestOptions)
330
331 if file_type == 'remote': os.unlink(path)
~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\tika\tika.py in callServer(verb, serverEndpoint, service, data, headers, verbose, tikaServerJar, httpVerbs, classpath, rawResponse, config_path, requestOptions)
544
545 resp = verbFn(serviceUrl, encodedData, **effectiveRequestOptions)
--> 546 encodedData.close() # closes the file reading data
547
548 if verbose:
AttributeError: 'bytes' object has no attribute 'close'
here is the code snippet.
tika.TikaClientOnly = True
#'http://tika:9998/tika'
pdf_file = r'C:\Users\kx764qe\Desktop\checkerlist\4_Working\ACC_code_gds\FS\AUB_Financials_Dec_2018.pdf'
data = parser.from_file(pdf_file, xmlContent=True)
Thanks. Luke
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (1 by maintainers)
Top Results From Across the Web
urllib error: AttributeError: 'bytes' object has no attribute 'read'
I got the same error {AttributeError: 'bytes' object has no attribute 'read'} in python3. This worked for me later without using json:
Read more >AttributeError: 'bytes' object has no attribute 'encode'
The Python "AttributeError: 'bytes' object has no attribute 'encode'" occurs when we call the encode() method on a bytes object.
Read more >Fix bug in python3.x: 'bytes' object has no attribute 'format'
Fix the bug in python3 runtime: AttributeError: 'bytes' object has no attribute 'format' Change-Id: I862efca9a09529ef3eb5993c52088924c21e4324
Read more >Read a file line by line, in Python when read in as bytes
... format I can read in, in a traditional python way? This generates the error: AttributeError: 'bytes' object has no attribute 'read_line'.
Read more >Issues - Django's bug tracker
#24651 closed Bug (duplicate). makemigrations: AttributeError: 'bytes' object has no attribute 'pattern' (Django 1.8/Python 3.4) ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Seems like I am having the same issue, I proposed a fix here : #253
Hope this helps!
I really like the project, thanks for the work!
I am getting the same issue as well.
I am parsing tika from:
doc = parser.from_file(file_name, serverEndpoint=http://localhost:9998/rmeta/text, headers={"X-Tika-PDFextractInlineImages": "true"})
I am running Tika 2.0.0 server locally, with the intention of replacing this with a docker container.