Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Detect typeface styles

See original GitHub issue

Congrats on the fantastic tool. Have you given any thought to making the network aware of italic, bold, etc., as well as different types of typefaces? As far as I can tell this should (hopefully) be a relatively small change.

Here’s how I imagine it could be implemented:

Treat all “stylistic” info (the specific font, whether it’s bold, italic, etc.) as an extra closed-class classification problem. The person doing the training is responsible for providing info on which kind of stylistic labels are present in the training data. E.g. if the training data has two different typefaces, a main font and an alternate font, and the alternate can optionally be italic, then the new stylistic classifier will have the following classes: main, alternate, alternate_italic.
The training data is somehow annotated for stylistic info. This is the slightly more annoying bit to implement I imagine. One could use some kind of XML markup to denote segments of characters which are in a font different from the main one, e.g.
```
This is the main font, then we have <alternate>some text in
the alternate font</alternate> and finally
<alternate_italic>the alternate font in italic</alternate_italic>
```
In the forward pass of the network, the old character classifier is kept, but additionally the new stylistic classifier is also run to predict the correct font.
???
Profit!

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:8 (5 by maintainers)

Top GitHub Comments

2reactions

chreulcommented, Apr 27, 2019

We experimented with something similar when working on a historical lexicon: https://zenodo.org/record/1451482#.XMReFegzY2w In this case study we decided to treat the task as two separate sequence classification problems: textual OCR and typography tagging. The respective models were trained and applied separately and the results were combined during a postprocessing step using the positional information from Calamari’s extended prediction data output. As of now I think that this is the best way to do it. Of course, the computational effort increases but the codecs stay minimal and each model can focus on its specific sub task. I would love a generic implementation of this but @ChWick is a little busy (i.e. lazy) right now 😃.

0reactions

ChWickcommented, Aug 7, 2020

Just random answers:

You can use the ATR model as “pretrained weights” so you do not have to start from scratch. One alternative is to train both models in parallel, i.e. sharing conv, pool, lstm layers, and add two FC layers (one OCR one Typo) and two loss functions. I tested this, but it performed very similar (even a bit worse) to having two models. This code is however not integrated in Calamari.
Having bbbbbbbb instead of one b has several advantages: a) Possible to capture typographic changes within a word (we had a project where this was the case quite often), b) it is straightforward to use the pretrained OCR weights. I think @chreul I think tested this approach

Using a PC to determine the typography at each “Pixel” seems an interesting idea and I would assume good results, however:

The word/character level annotation might not be very accurate
This must be fully implemented (a lot of work) The big advantage is that the “alignment” step is omitted, and I like that! So feel free to test this approach! It will work if you have enough time and training data.

It is also possible to “share” some code. I use the positional prediction of the Calamari types to obtain a pixel-wise labeling (similar to your PC approach!) to solve the alignment.