Tika-Python on Windows: Tika server returns status 503
See original GitHub issueI am testing Tika-Python on my Windows 10 laptop, but I cannot get it to work. Using the following Python script (directly taken from this site, with ‘path/to/file’ naturally changed to a correct filepath):
"""Test Apache Tika."""
import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('path/to/file')
print(parsed['metadata'])
print(parsed['content'])
I get the following:
$ python test-tika.py
2018-11-22 10:24:34,112 [MainThread ] [WARNI] Tika server returned status: 503
Traceback (most recent call last):
File "test-tika.py", line 7, in <module>
parsed = parser.from_file('C:\\Users\\Christophe.Grandsire\\Cases\\Data\\AI in FDP\\Raw Pilot data\\magnus C&C.pdf')
File "C:\Users\Christophe.Grandsire\.virtualenvs\Tika-OOOIfOBP\lib\site-packages\tika\parser.py", line 40, in from_file
return _parse(jsonOutput)
File "C:\Users\Christophe.Grandsire\.virtualenvs\Tika-OOOIfOBP\lib\site-packages\tika\parser.py", line 77, in _parse
realJson = json.loads(jsonOutput[1])
File "c:\program files\python37\Lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "c:\program files\python37\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\program files\python37\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here is the contents of tika.log
, which shows that the tika-server JAR was correctly downloaded, but that every attempt to use it returns a 503 status code:
2018-11-21 15:24:59,746 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to C:\Users\CHRIST~1.GRA\AppData\Local\Temp\tika-server.jar.
2018-11-21 15:27:44,758 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar.md5 to C:\Users\CHRIST~1.GRA\AppData\Local\Temp\tika-server.jar.md5.
2018-11-21 15:27:45,319 [MainThread ] [WARNI] Failed to see startup log message; retrying...
2018-11-21 15:28:03,322 [MainThread ] [WARNI] Tika server returned status: 503
2018-11-21 15:35:25,203 [MainThread ] [WARNI] Tika server returned status: 503
2018-11-21 15:35:56,649 [MainThread ] [WARNI] Tika server returned status: 503
2018-11-22 10:23:32,192 [MainThread ] [WARNI] Failed to see startup log message; retrying...
2018-11-22 10:23:55,192 [MainThread ] [WARNI] Tika server returned status: 503
2018-11-22 10:24:34,112 [MainThread ] [WARNI] Tika server returned status: 503
And here is the latest tika-server.log
file:
nov. 22, 2018 10:23:33 A.M. org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
nov. 22, 2018 10:23:33 A.M. org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to image files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
nov. 22, 2018 10:23:33 A.M. org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
INFO Starting Apache Tika 1.19 server
INFO Setting the server's publish address to be http://0.0.0.0:9998/
INFO Logging initialized @1629ms to org.eclipse.jetty.util.log.Slf4jLog
INFO jetty-9.4.z-SNAPSHOT; built: 2018-06-05T18:24:03.829Z; git: d5fc0523cfa96bfebfbda19606cad384d772f04c; jvm 11.0.1+13-LTS
INFO Started ServerConnector@63648ee9{HTTP/1.1,[http/1.1]}{0.0.0.0:9998}
INFO Started @2108ms
WARN Empty contextPath
INFO Started o.e.j.s.h.ContextHandler@1536602f{/,null,AVAILABLE}
INFO Started Apache Tika server at http://0.0.0.0:9998/
Any idea what is going on here?
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (4 by maintainers)
Top Results From Across the Web
python - TIKA server returned status 500. I have latest version ...
This error 500 seems to be returned when Tika Server fails for reasons such as running out of memory in heap, and other...
Read more >How to Fix the HTTP Error 503 Service Unavailable - Kinsta
The 503 (Service Unavailable) status code indicates that the server is currently unable to handle the request due to a temporary overload or ......
Read more >CHANGES-1.20.txt - Apache Archives
Fix bug in tika-server when run in legacy mode (not -spawnChild) that caused it to return 503 on documents submitted after it hit...
Read more >tika · PyPI
A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as...
Read more >python/2193/tika-python/tika/tika.py - Program Talk
Windows = True if platform.system() = = "Windows" else False ... """Run the Tika command by calling the Tika server and return results...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you very much for this hint. I am sitting behind a corporate proxy and I had a wrong configuration, such that my python script tried to connect via proxy to the local tika server. Actually not the tika server but the proxy server returned status code 503. Now, I have fixed my proxy settings and everythings works as expected.
@BorisWiegand can you take a look at https://cwiki.apache.org/confluence/display/TIKA/TikaOCR and try to interact with Tika server that way? Does it work? that will isolate the problem to whether or not it’s an issue in python or your server setup.