Upgrade lucene library to 9.4.0 for jena-text
See original GitHub issueVersion
4.7.0-SNAPSHOT
Feature
There is a migration guide:
https://lucene.apache.org/core/9_4_0/MIGRATE.html
For jena to build it seems to be enough to change the version, and migrating the dependency org.apache.lucene:lucene-analyzers-common
to org.apache.lucene:lucene-analysis-common
.
However the migration guide states that Lucene sometimes uses JUL.
Lucene Core now logs certain warnings and errors using Java Util Logging (JUL). It is therefore recommended to install wrapper libraries with JUL logging handlers to feed the log events into your app’s own logging system.
Under normal circumstances Lucene won’t log anything, but in the case of a problem users should find the logged information in the usual log files.
Lucene also provides a JavaLoggingInfoStream implementation that logs IndexWriter events using JUL.
To feed Lucene’s log events into the well-known Log4J system, we refer to the Log4j JDK Logging Adapter in combination with the corresponding system property: java.util.logging.manager=org.apache.logging.log4j.jul.LogManager.
I think the options are:
- Don’t follow recommendation, messages go to stdout when running and should be visible from the different contexts of running
- add dependency to pom main pom.xml and jena-text pom.xml?) for https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-jul and add a note in the documentation for jena-text to use a property for logging
- OR add the dependency https://mvnrepository.com/artifact/org.slf4j/jul-to-slf4j for sl4j bridge (which has a note on performance of the bridge https://www.slf4j.org/legacy.html#jul-to-slf4j)
Change in behaviour:
StandardAnalyzer looks like it is used by default:
English stopwords are no longer removed by default in StandardAnalyzer (LUCENE-7444) To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument to the constructor
I’ve looked through the othet notes by mostly checking for usage (grep -R in jena-text
folder), and think these are the parts mentioned in the migration affecting Jena.One thing could be that changes could break custom drop in configured implementations since there are alot of changes in the paths for the artifacts and elsewhere?
Are you interested in contributing a solution yourself?
Yes
Issue Analytics
- State:
- Created a year ago
- Comments:9 (8 by maintainers)
Top GitHub Comments
Absent any other information, lets’ do this upgrade.
I think the current documentation points to following the Lucene behavior, since it is mentioned multiple times that the StandardAnalyzer from Lucene is used (and implicitly its behavior?)
Maybe a note could be added in the documentation
Note From Lucene version 9 English stopwords are no longer removed by default in StandardAnalyzer. This also changesthe default behavior for Jena 4.X. You can keep the old behavior by configuring a custom analyzer in the assembler. (link to custom analyzer or source code of assembler containing list of english stop words?)
(List from https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L48)