question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

v2.0 system for definition extractor

See original GitHub issue

Develop v2.0 system for doc-level user interfaces and doc-level evaluation

(See dykang/definition_extractor_v2.0)

  • Model ensemble
    • Development/Testing in the internal NLP repo (+2.0 F1)
    • Deployment to frontend system (not sure to do this or not, because of scalability)
  • [ x] Fine-tuning roberta-large with S2ORC (with @kyleclo ) (deprecated due to low performance)
    • Initial development in the internal NLP repo
    • Scale-up for the entire S2ORC dataset: data split and caching
    • Deployment to frontend system
  • Categorization of term types: (with @andrewhead )
    • Types:
      • Symbols,
      • Protologisms (i.e., new words defined in the paper)
      • Abbreviations
    • Entities for each type:
      • Symbols: nicknames, definitions
      • Protologisms: definitions
      • Abbreviations: expansions
    • [x ] Extend the acronym/nickname detector to non-definitional sentences
  • Add document-level features (e.g., term position, document frequency)
    • Added position_ratio for relative position of a term in the entire document
    • Added section_name for the name of section that the term appears
  • Add cross-document-level features (e.g., term position, document frequency, cross-distance)
    • Make a dictionary of previously detected terms and symbols
  • within-document term grouping based on their definitions (deprecated)
    • Add coreference links
    • Analysis between term occurrences and detected term definitions
  • add confidence score
    • Development/Testing in the internal NLP repo
    • Deployment to frontend system
  • fix BERT tokenization error

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
dykangcommented, Nov 29, 2020

Closing this issue as we approach merging #146 to master. The remaining features mentioned here will be merged to v3 system or deprecated due to the lack of usefulness.

0reactions
dykangcommented, Aug 21, 2020

Target papers for general user study

  • 1804.08199, 2002.04138, 1909.13433
  • 1704.05572, 1609.05143, 1611.01603

Target papers for comparison study

  • ?
Read more comments on GitHub >

github_iconTop Results From Across the Web

Covidence Data Extraction: Making 2 0 work for your project
This Covidence 101 training webinar provides a deep dive into our Data Extraction 2.0 offering. We'll cover the following:✔️ A complete ...
Read more >
Bioruebe/UniExtract2: Universal Extractor 2 is a tool to ...
Universal Extractor 2 is a tool designed to extract files from any type of extractable file. Unlike most archiving programs, UniExtract is not...
Read more >
definition of extractor
Extractor is a Generic term used for lot of Objects involved in tranferring data from Source System to Target System. It does not...
Read more >
Randomness extractor
A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weakly random entropy source, together...
Read more >
The Floodwater Depth Estimation Tool (FwDET v2.0) for ...
Here we present a new version of the tool, FwDET v2.0, which enables water depth estimation for coastal flooding. FwDET v2.0 features a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found