Tika not working with custom jar path
See original GitHub issueI’m working on a Python module that uses Tika, and I’m trying to use a custom jar file so that it does not get downloaded each time
I have already placed the jar file and the md5 file inside the module
my_module
========
__init__.py
package1
package2
package3
__init__.py
pdf.py
tika-server.jar
tika-server.jar.md5
pdf.py
====
import os
from tika import tika, parser
tika.TikaJarPath = os.path.dirname(__file__)
def get_pdf_text(path):
parsed = parser.from_file(path)
return parsed['content']
Tika does not work and this is the output :
[WARNI] Failed to see startup log message; retrying...
[WARNI] Failed to see startup log message; retrying...
[WARNI] Failed to see startup log message; retrying...
[ERROR] Tika startup log message not received after 3 tries.
The problem happens when the jar file is inside the module. It works if I specify another location, but that’s not an option because when I deploy the Python module, I need the jar file to contain it.
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (1 by maintainers)
Top Results From Across the Web
External jar file problem (TIKA) - support - Lucee Dev
Hey everybody! I'm having some difficulties implementing a well-known jar library called 'Tika'. This is used to parse files to readable ...
Read more >How to use a Tika custom parser in a jar file? - Stack Overflow
To install a plugin, download it according to instructions below and drop the jar(s) on your classpath. Tika will auto detect the plugin....
Read more >tika-parsers not usable on module path (Java 11) - Apache
jar Caused by: java.lang.module.InvalidModuleDescriptorException: Provider class org.apache.tika.parser.external.CompositeExternalParser not in ...
Read more >Chapter 2. Getting started with Tika - Tika in Action
The quick-and-easy way to get started with Tika is to use the Tika application, a standalone JAR archive that contains everything you need...
Read more >TIKA - Quick Guide - Tutorialspoint
... file tika-app-1.6.jar. Add the complete path of the jar file as shown in the table below. ... To resolve this problem, Tika...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@amonaldo, You need to specify the absolute path to the parameter of dirname which would become like this:
os.path.join(os.getcwd(), __file__)
Moreover, you need to override three variables of tika module i.e., log_path, TikaJarPath,TikaFilesPath in order to make your modified script work.
Modify your pdf.py (updating the filename):
@RafayGhafoor Thanks for your time, but I have found a solution although it’s not perfect.
I realized that I can get the user home directory using the
os
moduletika.TikaJarPath = os.path.expanduser("~")
This way Tika works fine and without any problem.