question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

grobid 0.7.2 not recognizing greek alphabets

See original GitHub issue

I’m running grobid-0.7.2 on Windows 11 using docker. I followed instructions from your documentation to mount the docker image lfoppiano/grobid:0.7.2.

While PDF extraction and sentence extraction works like a charm, grobid is not recognizing greek alphabets in my PDFs. I have a lot of text in the PDF that includes greek alphabets as a suffix to english words. For example, here’s a sentence from my PDF - "It has also been reported that the production of interferon-g (IFN-g) may be lowered." g here is actually written as gamma in the PDF (I just didn’t know how to write greek alphabets here).

grobid is converting the greek alphabet gamma (g) to the unicode delete character U+2425.

I also changed the sentence segmenter to pragmatic sentence detector in my local yaml file and mounted it using the following command docker run --rm -p 8070:8070 -p 8070:8070 -v "D:/grobid/grobid-0.7.2/grobid-0.7.2/grobid-home/config/grobid.yaml":/opt/grobid/grobid-home/config/grobid.yaml:ro lfoppiano/grobid:0.7.2

grobid is still not parsing greek alphabets correctly. I’m pretty sure i’m missing something. Can someone please help me?

Thanks in advance PD

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
prashanthdumpuriNeurancecommented, Nov 27, 2022

Thanks @kermitt2. I’m closing this issue.

P.S. Not only is grobid a great tool, but support is also great. I’m glad I stumbled upon it.

1reaction
prashanthdumpuriNeurancecommented, Nov 27, 2022

Thanks @kermitt2. I figured as much that it’s a PDF encoding issue. And thanks for your offer to check it out in detail. Here’s the link to the PDF

https://www.academia.edu/download/73371279/4000650.pdf

Let me know if it does not work and I can upload the PDF to dropbox/google drive. Also, can you please let me know how you debug it?

Thanks again PD

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issue with producing <list> and <item> in grobid version 0.7.1
I tried training FULLTEXT model for list and item tags in Grobid 0.7.1-SNAPSHOT version which ... Same issue occured for the 0.7.2-SNAPSHOT.
Read more >
Troubleshooting and known issues - GROBID Documentation
The logs of Grobid are located in logs/grobid-service.log , the console log from gradle are usually not very useful to understand the problem....
Read more >
grobid - bytemeta
Build failed with exception · about how to get the training and test datasets · grobid 0.7.2 not recognizing greek alphabets · Regarding...
Read more >
grobid - githubmemory
grobid repo issues. ... Grobid "processFulltextDocument" skipping some references. sarique2003 ... grobid 0.7.2 not recognizing greek alphabets. Tanmay98.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found