question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AttributeError: 'bytes' object has no attribute 'close'

See original GitHub issue

Hi Guys,

Got this error below. I thought it is the server issue, so killed the process running on 9998 (as advised in #166). But it does not work.

FYI, I can run the code smoothly in another virtual env.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-d5e42260332e> in <module>
      2 #'http://tika:9998/tika'
      3 pdf_file = r'C:\Users\kx764qe\Desktop\checkerlist\4_Working\ACC_code_gds\FS\AUB_Financials_Dec_2018.pdf'
----> 4 data = parser.from_file(pdf_file, xmlContent=True)

~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\tika\parser.py in from_file(filename, serverEndpoint, xmlContent, headers, config_path, requestOptions)
     37     else:
     38         jsonOutput = parse1('all', filename, serverEndpoint, services={'meta': '/meta', 'text': '/tika', 'all': '/rmeta/xml'},
---> 39                             headers=headers, config_path=config_path, requestOptions=requestOptions)
     40     return _parse(jsonOutput)
     41 

~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\tika\tika.py in parse1(option, urlOrPath, serverEndpoint, verbose, tikaServerJar, responseMimeType, services, rawResponse, headers, config_path, requestOptions)
    327     headers.update({'Accept': responseMimeType, 'Content-Disposition': make_content_disposition_header(path)})
    328     status, response = callServer('put', serverEndpoint, service, open(path, 'rb'),
--> 329                                   headers, verbose, tikaServerJar, config_path=config_path, rawResponse=rawResponse, requestOptions=requestOptions)
    330 
    331     if file_type == 'remote': os.unlink(path)

~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\tika\tika.py in callServer(verb, serverEndpoint, service, data, headers, verbose, tikaServerJar, httpVerbs, classpath, rawResponse, config_path, requestOptions)
    544 
    545     resp = verbFn(serviceUrl, encodedData, **effectiveRequestOptions)
--> 546     encodedData.close() # closes the file reading data
    547 
    548     if verbose:

AttributeError: 'bytes' object has no attribute 'close'

here is the code snippet.

tika.TikaClientOnly = True
#'http://tika:9998/tika'
pdf_file = r'C:\Users\kx764qe\Desktop\checkerlist\4_Working\ACC_code_gds\FS\AUB_Financials_Dec_2018.pdf'
data = parser.from_file(pdf_file, xmlContent=True)

Thanks. Luke

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
Troflebcommented, Nov 13, 2019

Seems like I am having the same issue, I proposed a fix here : #253

Hope this helps!

I really like the project, thanks for the work!

1reaction
jordanparker6commented, Nov 13, 2019

I am getting the same issue as well.

I am parsing tika from:

doc = parser.from_file(file_name, serverEndpoint=http://localhost:9998/rmeta/text, headers={"X-Tika-PDFextractInlineImages": "true"})

I am running Tika 2.0.0 server locally, with the intention of replacing this with a docker container.

Read more comments on GitHub >

github_iconTop Results From Across the Web

urllib error: AttributeError: 'bytes' object has no attribute 'read'
I got the same error {AttributeError: 'bytes' object has no attribute 'read'} in python3. This worked for me later without using json:
Read more >
AttributeError: 'bytes' object has no attribute 'encode'
The Python "AttributeError: 'bytes' object has no attribute 'encode'" occurs when we call the encode() method on a bytes object.
Read more >
Fix bug in python3.x: 'bytes' object has no attribute 'format'
Fix the bug in python3 runtime: AttributeError: 'bytes' object has no attribute 'format' Change-Id: I862efca9a09529ef3eb5993c52088924c21e4324
Read more >
Read a file line by line, in Python when read in as bytes
... format I can read in, in a traditional python way? This generates the error: AttributeError: 'bytes' object has no attribute 'read_line'.
Read more >
Issues - Django's bug tracker
#24651 closed Bug (duplicate). makemigrations: AttributeError: 'bytes' object has no attribute 'pattern' (Django 1.8/Python 3.4) ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found