Idea on how to handle manually editing the docs files while keeping the automated conversion
See original GitHub issueDecided to create a central discussion about this here instead of all of the ideas getting lost in various threads 😃
Here’s some found previous notes
- https://github.com/apache/lucenenet/issues/282#issuecomment-635345730 (note: a lot has changed from this discussion to now!)
- https://github.com/apache/lucenenet/pull/393
- https://github.com/apache/lucenenet/pull/231
- https://github.com/apache/lucenenet/pull/229#issuecomment-520831453 (scroll down to the “code samples” heading)
The issue is: We have a conversion program that converts the Lucene (java) docs to a doc format that we can use for Lucene.Net. Each time this is run it will overwrite our docs files. This means that any manual updates made to these docs files such as fixing spelling mistakes and more importantly code snippets would just be overwritten.
I have a simple proposal
In various old threads we had some complex proposals but none of these are ‘perfect’ and won’t cater very well for things like spelling mistakes. Instead of inventing some system, we can simply use Git:
- We have a branch for each lucene (java) docs conversion that we plan to do
- currently this is just one: 4.8.1. This branch can be called something like
docs/converted/4.8.1
and created frommaster
- currently this is just one: 4.8.1. This branch can be called something like
- These
converted
branches are the only branches we ever execute theJavaDocToMarkdownConverter
conversion program.- So currently, we would run
JavaDocToMarkdownConverter
once we create this branch. There will probably be no changes now since we’ve already executed this in master.
- So currently, we would run
- Merge the
converted
branch to the master branch - Make any required changes to the master branch docs files
- If we need to re-run
JavaDocToMarkdownConverter
because we’ve made some fixes to it to fix some conversions, formatting, etc… we re-run this on theconverted
branch, then merge changes to the master branch. This will trigger a bunch of merge conflicts which can be resolved by Git merge.
We won’t be running this conversion program that often so the amount of merging would be minimal and we’d only have merge conflicts on files that we’ve changed that have large conflicts with changes we’ve made in the converter … which would be quite rare so I don’t foresee a lot of work with a Git merge.
Some basic rules to this:
- We will never merge into a
converted
branch from master since this will mean merge conflicts are not triggered and everythig would just be overwritten again - When we start working on a newer version of Lucene.Net, a new
converted
branch will be created from the currentconverted
branch and the conversion is run there for the new lucene version.
@NightOwl888 Any thoughts on that?
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (9 by maintainers)
@NightOwl888 This is complete now, there is a new branch
docs/markdown-converted/4.8.1
which will be used solely for re-executing the dotnet global tool for the docs conversion.What needs to happen now is:
There’s a bunch of other outstanding docs tasks too that I’ll look into. I’ll close this one.
That was actually not part of Lucene, but part of the commons-codec package from Apache, which was imported to save us from maintaining an external library and porting the parts of it we don’t need. There are a couple others, the SAX and TagSoup modules that were imported into Lucene.Net.Benchmarks to parse HTML. AFAIK, we could use HTML Agility Pack instead and dump these classes if someone were willing to analyze this at a higher level to map over the functionality.
The only actual Lucene case I can think of where we are missing the document is the migration guide (#399), presumably because it was named Migrate.txt instead of following the other “overview” and “package” naming conventions.
Lucene.Net.Codecs was only different because we were trying to release that document before we had the conversion process sorted out. Now that it is, would it make sense to integrate these changes back into the original doc?
Are there any other cases you can specifically recall where the documentation doesn’t exist in Lucene? If there are no other exceptions and your suggestion is not to use override files on the rest, I am on board with that - it would be fewer files to maintain and less confusing to contributors.