v2.0 system for definition extractor
See original GitHub issueDevelop v2.0 system for doc-level user interfaces and doc-level evaluation
(See dykang/definition_extractor_v2.0)
- Model ensemble
- Development/Testing in the internal NLP repo (+2.0 F1)
-
Deployment to frontend system (not sure to do this or not, because of scalability)
- [ x] Fine-tuning roberta-large with S2ORC (with @kyleclo ) (deprecated due to low performance)
- Initial development in the internal NLP repo
- Scale-up for the entire S2ORC dataset: data split and caching
- Deployment to frontend system
- Categorization of term types: (with @andrewhead )
- Types:
- Symbols,
- Protologisms (i.e., new words defined in the paper)
- Abbreviations
- Entities for each type:
- Symbols: nicknames, definitions
- Protologisms: definitions
- Abbreviations: expansions
- [x ] Extend the acronym/nickname detector to non-definitional sentences
- Types:
- Add document-level features (e.g., term position, document frequency)
- Added
position_ratio
for relative position of a term in the entire document - Added
section_name
for the name of section that the term appears
- Added
- Add cross-document-level features (e.g., term position, document frequency, cross-distance)
- Make a dictionary of previously detected terms and symbols
- within-document term grouping based on their definitions (deprecated)
- Add coreference links
- Analysis between term occurrences and detected term definitions
- add confidence score
- Development/Testing in the internal NLP repo
- Deployment to frontend system
- fix BERT tokenization error
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Covidence Data Extraction: Making 2 0 work for your project
This Covidence 101 training webinar provides a deep dive into our Data Extraction 2.0 offering. We'll cover the following:✔️ A complete ...
Read more >Bioruebe/UniExtract2: Universal Extractor 2 is a tool to ...
Universal Extractor 2 is a tool designed to extract files from any type of extractable file. Unlike most archiving programs, UniExtract is not...
Read more >definition of extractor
Extractor is a Generic term used for lot of Objects involved in tranferring data from Source System to Target System. It does not...
Read more >Randomness extractor
A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weakly random entropy source, together...
Read more >The Floodwater Depth Estimation Tool (FwDET v2.0) for ...
Here we present a new version of the tool, FwDET v2.0, which enables water depth estimation for coastal flooding. FwDET v2.0 features a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Closing this issue as we approach merging #146 to master. The remaining features mentioned here will be merged to v3 system or deprecated due to the lack of usefulness.
Target papers for general user study
Target papers for comparison study