question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to recognize blank, recognize English and Chinese in one model

See original GitHub issue

Firstly, you codes are great. I trained with SynthText90k dataset and achieved very good performance on English words.

there are several questions. hopefully you can give me a hand. Thank you very much. thanks for your time.

  1. How to recognize blank in one sentence? for example,I want to recognize “I love python” there is blank between I and love. how to handle this problem? just add blank in alphabet? like this? and prepare for the training data alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ """

  2. Can we recognize English and Chinese in one model? if we want to recognize English and Chinese in one model, how to do? just make alphabet contain all English and Chinese characters? just like this? alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ是不我一有大在人了中到資..."""

  3. if we want to recognize very long sentence? do you think it would be better to train with very long sentences or we can just train with short sentence? because your current model only support text length less than 26. so have to modify the network if I want to support training with long sentence.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Holmeyoungcommented, Apr 25, 2020

Hi,

  1. these are two different things, recognize sentence and segment sentences. Just add blank in the labels is not recommended.

  2. just make alphabet contain all English and Chinese characters, like what you say.

  3. Calculate the last lstm T length. The longer you resize image width to be, the longer you can train with. One location for one word.

0reactions
cvchongcicommented, Aug 13, 2020

@ducbluee Hi, I used very limited synthetic data to train the model. so the model does not work well on real-world images. you can follow the way I handle blank.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handwrite on your keyboard - Android - Gboard Help
Select a handwriting keyboard, like English (US) Handwriting. Your keyboard will become a blank writing area where you can enter words.
Read more >
Everything You Need to Know about Chinese Numbers
Chinese numbers are one of the most useful things to know as you are going to come across them all the time when...
Read more >
N-gram Language Models - Stanford University
CHAPTER 3 • N-GRAM LANGUAGE MODELS. A probabilistic model of word sequences could suggest that briefed reporters on is a more probable English...
Read more >
Models & Languages · spaCy Usage Documentation
It's enabled when you create a new Chinese language class or call spacy.blank("zh") . jieba, Jieba: to use Jieba for word segmentation, you ......
Read more >
Read and write Chinese characters - 读写汉字 - 学中文
Learn Chinese characters with innovative Chinese-english dictionary, stroke order animations, online Chinese lessons and character writing worksheets.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found