question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Upgrade lucene library to 9.4.0 for jena-text

See original GitHub issue

Version

4.7.0-SNAPSHOT

Feature

There is a migration guide:

https://lucene.apache.org/core/9_4_0/MIGRATE.html

For jena to build it seems to be enough to change the version, and migrating the dependency org.apache.lucene:lucene-analyzers-common to org.apache.lucene:lucene-analysis-common.

However the migration guide states that Lucene sometimes uses JUL.

Lucene Core now logs certain warnings and errors using Java Util Logging (JUL). It is therefore recommended to install wrapper libraries with JUL logging handlers to feed the log events into your app’s own logging system.

Under normal circumstances Lucene won’t log anything, but in the case of a problem users should find the logged information in the usual log files.

Lucene also provides a JavaLoggingInfoStream implementation that logs IndexWriter events using JUL.

To feed Lucene’s log events into the well-known Log4J system, we refer to the Log4j JDK Logging Adapter in combination with the corresponding system property: java.util.logging.manager=org.apache.logging.log4j.jul.LogManager.

I think the options are:

Change in behaviour:

StandardAnalyzer looks like it is used by default:

English stopwords are no longer removed by default in StandardAnalyzer (LUCENE-7444) To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument to the constructor

I’ve looked through the othet notes by mostly checking for usage (grep -R in jena-text folder), and think these are the parts mentioned in the migration affecting Jena.One thing could be that changes could break custom drop in configured implementations since there are alot of changes in the paths for the artifacts and elsewhere?

Are you interested in contributing a solution yourself?

Yes

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
afscommented, Nov 4, 2022

Absent any other information, lets’ do this upgrade.

1reaction
OyvindLGjesdalcommented, Oct 30, 2022

I think the current documentation points to following the Lucene behavior, since it is mentioned multiple times that the StandardAnalyzer from Lucene is used (and implicitly its behavior?)

The default analyzer defaults to Lucene’s StandardAnalyzer.

If a Lucene or Elasticsearch text index is used, then by default the Lucene StandardAnalyzer is used.

The multilingual analyzer becomes the default analyzer and the Lucene StandardAnalyzer is the default analyzer used when there is no language tag.

Maybe a note could be added in the documentation

Note From Lucene version 9 English stopwords are no longer removed by default in StandardAnalyzer. This also changesthe default behavior for Jena 4.X. You can keep the old behavior by configuring a custom analyzer in the assembler. (link to custom analyzer or source code of assembler containing list of english stop words?)

(List from https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L48)

("a" "an" "and" "are" "as" "at" "be" "but" "by" "for" "if" "in" 
 "into" "is" "it" "no" "not" "of" "on" "or" "such" "that" "the" 
"their" "then" "there" "these" "they" "this" "to" "was" "will" "with")  
Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Lucene™ 9.4.0 Documentation
Lucene is a Java full-text search engine. Lucene is not a complete application, but rather a code library and API that can easily...
Read more >
Lucene™ Core News
The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0. Apache Lucene is a high-performance, full-featured search engine library written ......
Read more >
Apache Lucene - Welcome to Apache Lucene
Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization ...
Read more >
Overview (Lucene 9.4.0 core API)
Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using ......
Read more >
Lucene™ Downloads
Lucene ™ Downloads. Official releases are usually created when the developers feel there are sufficient changes, improvements and bug fixes to warrant a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found