Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Upgrade lucene library to 9.4.0 for jena-text

See original GitHub issue

Version

4.7.0-SNAPSHOT

Feature

There is a migration guide:

https://lucene.apache.org/core/9_4_0/MIGRATE.html

For jena to build it seems to be enough to change the version, and migrating the dependency org.apache.lucene:lucene-analyzers-common to org.apache.lucene:lucene-analysis-common.

However the migration guide states that Lucene sometimes uses JUL.

Lucene Core now logs certain warnings and errors using Java Util Logging (JUL). It is therefore recommended to install wrapper libraries with JUL logging handlers to feed the log events into your app’s own logging system.

Under normal circumstances Lucene won’t log anything, but in the case of a problem users should find the logged information in the usual log files.

Lucene also provides a JavaLoggingInfoStream implementation that logs IndexWriter events using JUL.

To feed Lucene’s log events into the well-known Log4J system, we refer to the Log4j JDK Logging Adapter in combination with the corresponding system property: java.util.logging.manager=org.apache.logging.log4j.jul.LogManager.

I think the options are:

Don’t follow recommendation, messages go to stdout when running and should be visible from the different contexts of running
add dependency to pom main pom.xml and jena-text pom.xml?) for https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-jul and add a note in the documentation for jena-text to use a property for logging
OR add the dependency https://mvnrepository.com/artifact/org.slf4j/jul-to-slf4j for sl4j bridge (which has a note on performance of the bridge https://www.slf4j.org/legacy.html#jul-to-slf4j)

Change in behaviour:

StandardAnalyzer looks like it is used by default:

English stopwords are no longer removed by default in StandardAnalyzer (LUCENE-7444) To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument to the constructor

I’ve looked through the othet notes by mostly checking for usage (grep -R in jena-text folder), and think these are the parts mentioned in the migration affecting Jena.One thing could be that changes could break custom drop in configured implementations since there are alot of changes in the paths for the artifacts and elsewhere?

Are you interested in contributing a solution yourself?

Yes

Issue Analytics

State:
Created a year ago
Comments:9 (8 by maintainers)

Top GitHub Comments

1reaction

afscommented, Nov 4, 2022

Absent any other information, lets’ do this upgrade.

1reaction

OyvindLGjesdalcommented, Oct 30, 2022

I think the current documentation points to following the Lucene behavior, since it is mentioned multiple times that the StandardAnalyzer from Lucene is used (and implicitly its behavior?)

The default analyzer defaults to Lucene’s StandardAnalyzer.

If a Lucene or Elasticsearch text index is used, then by default the Lucene StandardAnalyzer is used.

The multilingual analyzer becomes the default analyzer and the Lucene StandardAnalyzer is the default analyzer used when there is no language tag.

Maybe a note could be added in the documentation

Note From Lucene version 9 English stopwords are no longer removed by default in StandardAnalyzer. This also changesthe default behavior for Jena 4.X. You can keep the old behavior by configuring a custom analyzer in the assembler. (link to custom analyzer or source code of assembler containing list of english stop words?)

(List from https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L48)

("a" "an" "and" "are" "as" "at" "be" "but" "by" "for" "if" "in" 
 "into" "is" "it" "no" "not" "of" "on" "or" "such" "that" "the" 
"their" "then" "there" "these" "they" "this" "to" "was" "will" "with")

Top Results From Across the Web

Apache Lucene™ 9.4.0 Documentation

Lucene is a Java full-text search engine. Lucene is not a complete application, but rather a code library and API that can easily...

Lucene™ Core News

The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0. Apache Lucene is a high-performance, full-featured search engine library written ......

Apache Lucene - Welcome to Apache Lucene

Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization ...

Overview (Lucene 9.4.0 core API)

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using ......

Lucene™ Downloads

Lucene ™ Downloads. Official releases are usually created when the developers feel there are sufficient changes, improvements and bug fixes to warrant a ......