Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Idea on how to handle manually editing the docs files while keeping the automated conversion

See original GitHub issue

Decided to create a central discussion about this here instead of all of the ideas getting lost in various threads 😃

Here’s some found previous notes

https://github.com/apache/lucenenet/issues/282#issuecomment-635345730 (note: a lot has changed from this discussion to now!)
https://github.com/apache/lucenenet/pull/393
https://github.com/apache/lucenenet/pull/231
https://github.com/apache/lucenenet/pull/229#issuecomment-520831453 (scroll down to the “code samples” heading)

The issue is: We have a conversion program that converts the Lucene (java) docs to a doc format that we can use for Lucene.Net. Each time this is run it will overwrite our docs files. This means that any manual updates made to these docs files such as fixing spelling mistakes and more importantly code snippets would just be overwritten.

I have a simple proposal

In various old threads we had some complex proposals but none of these are ‘perfect’ and won’t cater very well for things like spelling mistakes. Instead of inventing some system, we can simply use Git:

We have a branch for each lucene (java) docs conversion that we plan to do
- currently this is just one: 4.8.1. This branch can be called something like docs/converted/4.8.1 and created from master
These converted branches are the only branches we ever execute the JavaDocToMarkdownConverter conversion program.
- So currently, we would run JavaDocToMarkdownConverter once we create this branch. There will probably be no changes now since we’ve already executed this in master.
Merge the converted branch to the master branch
Make any required changes to the master branch docs files
If we need to re-run JavaDocToMarkdownConverter because we’ve made some fixes to it to fix some conversions, formatting, etc… we re-run this on the converted branch, then merge changes to the master branch. This will trigger a bunch of merge conflicts which can be resolved by Git merge.

We won’t be running this conversion program that often so the amount of merging would be minimal and we’d only have merge conflicts on files that we’ve changed that have large conflicts with changes we’ve made in the converter … which would be quite rare so I don’t foresee a lot of work with a Git merge.

Some basic rules to this:

We will never merge into a converted branch from master since this will mean merge conflicts are not triggered and everythig would just be overwritten again
When we start working on a newer version of Lucene.Net, a new converted branch will be created from the current converted branch and the conversion is run there for the new lucene version.

@NightOwl888 Any thoughts on that?

Issue Analytics

State:
Created 3 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

Shazwazzacommented, Feb 21, 2021

@NightOwl888 This is complete now, there is a new branch docs/markdown-converted/4.8.1 which will be used solely for re-executing the dotnet global tool for the docs conversion.

What needs to happen now is:

Rebuild and publish the website and docs to reflect recent changes and fixes
Update the docs to reflect this approach

There’s a bunch of other outstanding docs tasks too that I’ll look into. I’ll close this one.

0reactions

NightOwl888commented, Mar 3, 2021

As an example of such a namespace, see https://lucenenet.apache.org/docs/4.8.0-beta00009/api/analysis-phonetic/Lucene.Net.Analysis.Phonetic.Language.html

That was actually not part of Lucene, but part of the commons-codec package from Apache, which was imported to save us from maintaining an external library and porting the parts of it we don’t need. There are a couple others, the SAX and TagSoup modules that were imported into Lucene.Net.Benchmarks to parse HTML. AFAIK, we could use HTML Agility Pack instead and dump these classes if someone were willing to analyze this at a higher level to map over the functionality.

The only actual Lucene case I can think of where we are missing the document is the migration guide (#399), presumably because it was named Migrate.txt instead of following the other “overview” and “package” naming conventions.

Lucene.Net.Codecs was only different because we were trying to release that document before we had the conversion process sorted out. Now that it is, would it make sense to integrate these changes back into the original doc?

Are there any other cases you can specifically recall where the documentation doesn’t exist in Lucene? If there are no other exceptions and your suggestion is not to use override files on the rest, I am on board with that - it would be fewer files to maintain and less confusing to contributors.