question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training segmentation model error - "too many synchronization issues, file not used in training data and to be fixed!"

See original GitHub issue

We are trying to add some more training data and for some (but not all) files that we add as well as some of the files included from the start we get the following error when running ./gradlew train_segmentation

VARNING: 100.v84-264.training.segmentation.tei.xml / too many synchronization issues, file not used in training data and to be fixed!

I’ve looked through some of the files looking for the fixes to make but I can’t find anything that stands out. Is there any way to see more information about the error? Currently my hypothesis is that we’ve misunderstood the structure required by the model. We don’t have any nested fields but everything is not covered by the other tags, such as body etc. Example:

<tei>
	<teiHeader>
		...
	</teiHeader>
	<text>
		<note place="headnote">some note</note>
		<page>1</page> out of 200 <-- this text is outside of any other tag. Could this be the error?
		<body>
		    some body <lb/>
		     some more body
		</body>
	</text>
</tei>

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
lfoppianocommented, Oct 22, 2020

@karatekaneen I think you have an old version. I can see this file has been modified in april/may 2020 when we updated the various models. I think if you do a git pull you should be able to get the current version.

0reactions
karatekaneencommented, Oct 22, 2020

I could have sworn that I did not modify the file but it’s clear that it’s not quite true. I’m going to do a hard reset and rebuild of the project and this problem (at least for this file) will probably go away.

Is it correct to assume that the errors from the other files stemmed from that the file contents was modified?

Anyways, thanks for the help and thanks for the awesome project!

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found