question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Heading in Chinese

See original GitHub issue

I found that sumy will distinguish heading and other sentences, so checked the source code and I found that: Whether a line is heading is decided by str.isupper() function. But in a str composed by Chinese characters, if it contains an uppercase alphabet, the isupper() will return True, but actually it is just a normal sentence instead of heading.

For example:

s1 = "你好啊,这儿有N盘蛋糕可以吃。"
s2 = "N你好啊,这儿有盘蛋糕可以吃。"
s3 = "你好啊,这儿有盘蛋糕可以吃。"

s1.isupper()  # True
s2.isupper()  # True
s3.isupper()  # False

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:2
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
seven-linglxcommented, May 22, 2019

I am agree with you that let PlaintextParser really plain text, but you can provide a optional API in PlaintextParser that appoint the HEADING of plain text, instead of detect by PlaintextParser because the text with ideal format is difficult. let this work decided by user is better than introduce a new parser.

About your doubt:

  1. In my opinion, there are not common method to detect heading. In format document, you can detect HEADING by the size of font which the HEADING tend to have large fonts. and in plain text, The first paragraph is a high probability of the HEADING.
  2. I think the result of SUMY deal with chinese is not bad. At the same time, I am sorry that i can’t provide more help because i have not enough experience about NLP. If i have more information i will update my reply.
0reactions
miso-belicacommented, May 19, 2019

Thank you all. I think this is more tricky. I tried to find out some solution but seems I should introduce a new parser. Maybe MarkdownParser and let PlaintextParser really plain text but some summarizers use headings to give you better results. Or I could introduce some new parser for common annotated texts used for summarizations. I don’t know.

I would like to ask you because I have no idea about Chinese texts.

  1. Are there any common ways how to detect headings?
  2. Does it even make sense to do it in Chinese?
  3. Are there any common text formats used for Chinese texts for summarizations? Or NLP in general?
  4. Is there anything else special from the English (or European) texts?

Thanks in advance and sorry for the really late reply. Have a nice day 🌞

Read more comments on GitHub >

github_iconTop Results From Across the Web

heading in Simplified Chinese - Cambridge Dictionary
Learn the words you need to communicate with confidence. (Translation of heading from the Cambridge English-Chinese (Simplified) Dictionary © ...
Read more >
HEADING - Translation in Chinese - bab.la
Translation for 'heading' in the free English-Chinese dictionary and many other Chinese translations.
Read more >
heading - Translation into Chinese - examples English
Translations in context of "heading" in English-Chinese from Reverso Context: under this heading, heading to, heading for, heading towards, heading out.
Read more >
biāo tí | Definition | Mandarin Chinese Pinyin English Dictionary
标题 : title, heading,... : biāo tí | Definition | Mandarin Chinese Pinyin English Dictionary | Yabla Chinese.
Read more >
Chinese Translation of “heading” | Collins ... - Collins Dictionary
Chinese Translation of “heading” | The official Collins English-Traditional Dictionary online. Over 100000 Chinese translations of English words and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found