question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make methods in IndexReaderUtils more consistent re: Analyzer

See original GitHub issue

We have:

# Pass in a no-op analyzer:
analyzer = pyanalysis.get_lucene_analyzer(stemming=False, stopwords=False)
index_utils.get_term_counts(term, analyzer=analyzer)
df, cf = index_utils.get_term_counts(term)

Here, we take an analyzer.

And:

# Fetch and traverse postings for an analyzed term:
postings_list = index_utils.get_postings_list(analyzed[0], analyze=False)
for posting in postings_list:
    print(f'docid={posting.docid}, tf={posting.tf}, pos={posting.positions}')

Here, we take a bool. Let’s make both consistent?

How about both take analyzer and accepts None? Passing in a “no-op” analyzer seems a bit janky.

Thoughts? @PepijnBoers @Chriskamphuis

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
chriskamphuiscommented, May 9, 2020

you can just do:

get_term_counts(self, term: str, analyzer=get_lucene_analyzer()) -> Tuple[int, int]: 
    if analyzer is None:
        # skip analysis (pass dummy to Anserini)
    else:
       # perform analysis with analyzer (either default or custom)
    ...

where you import that function from pyanalysis

0reactions
PepijnBoerscommented, May 9, 2020

Looks good, but then we have to specify default somewhere, otherwise we face a NameError. The question would then also be how/where to define default, right?

Read more comments on GitHub >

github_iconTop Results From Across the Web

IndexWriter (Lucene 7.4.0 API)
Each method that changes the index returns a long sequence number, which expresses the effective order in which each change was applied. commit()...
Read more >
anserini/IndexReaderUtils.java at master
* Computes the score of a document with respect to a query given a scoring function and an analyzer. *. * @param reader...
Read more >
Lucene 4 Essentials for Text Search and Indexing
Most of this post is excerpted from Text Processing in Java, Chapter 7, ... Documents are indexed via the IndexWriter 's default Analyzer...
Read more >
org.apache.lucene.index.IndexWriter (Java2HTML)
There are 55 also <a href="#IndexWriter(org.apache.lucene.store. ... Analyzer)"><b>constructors</b></a> 56 with no <code>create</code> argument which 57 ...
Read more >
Lucene Version 3.0 Tutorial
by relevancy with documents most similar to the query having the highest ... These methods are all specified in the Fieldable interface.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found