Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make methods in IndexReaderUtils more consistent re: Analyzer

See original GitHub issue

We have:

# Pass in a no-op analyzer:
analyzer = pyanalysis.get_lucene_analyzer(stemming=False, stopwords=False)
index_utils.get_term_counts(term, analyzer=analyzer)
df, cf = index_utils.get_term_counts(term)

Here, we take an analyzer.

And:

# Fetch and traverse postings for an analyzed term:
postings_list = index_utils.get_postings_list(analyzed[0], analyze=False)
for posting in postings_list:
    print(f'docid={posting.docid}, tf={posting.tf}, pos={posting.positions}')

Here, we take a bool. Let’s make both consistent?

How about both take analyzer and accepts None? Passing in a “no-op” analyzer seems a bit janky.

Thoughts? @PepijnBoers @Chriskamphuis

Issue Analytics

State:
Created 3 years ago
Comments:10 (3 by maintainers)

Top GitHub Comments

2reactions

chriskamphuiscommented, May 9, 2020

you can just do:

get_term_counts(self, term: str, analyzer=get_lucene_analyzer()) -> Tuple[int, int]: 
    if analyzer is None:
        # skip analysis (pass dummy to Anserini)
    else:
       # perform analysis with analyzer (either default or custom)
    ...

where you import that function from pyanalysis

0reactions

PepijnBoerscommented, May 9, 2020

Looks good, but then we have to specify default somewhere, otherwise we face a NameError. The question would then also be how/where to define default, right?

Top Results From Across the Web

IndexWriter (Lucene 7.4.0 API)

Each method that changes the index returns a long sequence number, which expresses the effective order in which each change was applied. commit()...

anserini/IndexReaderUtils.java at master

* Computes the score of a document with respect to a query given a scoring function and an analyzer. *. * @param reader...

Lucene 4 Essentials for Text Search and Indexing

Most of this post is excerpted from Text Processing in Java, Chapter 7, ... Documents are indexed via the IndexWriter 's default Analyzer...

org.apache.lucene.index.IndexWriter (Java2HTML)

There are 55 also <a href="#IndexWriter(org.apache.lucene.store. ... Analyzer)"><b>constructors</b></a> 56 with no <code>create</code> argument which 57 ...

Lucene Version 3.0 Tutorial

by relevancy with documents most similar to the query having the highest ... These methods are all specified in the Fieldable interface.