Training segmentation model error - "too many synchronization issues, file not used in training data and to be fixed!"
See original GitHub issueWe are trying to add some more training data and for some (but not all) files that we add as well as some of the files included from the start we get the following error when running ./gradlew train_segmentation
VARNING: 100.v84-264.training.segmentation.tei.xml / too many synchronization issues, file not used in training data and to be fixed!
I’ve looked through some of the files looking for the fixes to make but I can’t find anything that stands out. Is there any way to see more information about the error? Currently my hypothesis is that we’ve misunderstood the structure required by the model. We don’t have any nested fields but everything is not covered by the other tags, such as body etc. Example:
<tei>
<teiHeader>
...
</teiHeader>
<text>
<note place="headnote">some note</note>
<page>1</page> out of 200 <-- this text is outside of any other tag. Could this be the error?
<body>
some body <lb/>
some more body
</body>
</text>
</tei>
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@karatekaneen I think you have an old version. I can see this file has been modified in april/may 2020 when we updated the various models. I think if you do a git pull you should be able to get the current version.
I could have sworn that I did not modify the file but it’s clear that it’s not quite true. I’m going to do a hard reset and rebuild of the project and this problem (at least for this file) will probably go away.
Is it correct to assume that the errors from the other files stemmed from that the file contents was modified?
Anyways, thanks for the help and thanks for the awesome project!