question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Index size bigger than the orginal data

See original GitHub issue

Is it normal to get an index file which is larger (in size) than the original data file?

I have JSON file with ~ 1MB of size (about 58,278 words). When Trying to build an index for it using the following code:

const index_ar = new FlexSearch({
  tokenize: "strict",
  rtl: true,
  split: /\s+/,
  doc: {
    id: "id",
    field: [
      'title',
      'incident_date_time',
      'location:name'
    ]
  }
});
 index_ar.add(data);

The index file size is ~ 2.1MB! I inspected the file size using the following method:

const exportedIndex = index_ar.export();
fs.writeFileSync('exported.json', JSON.stringify(exportedIndex));

Is there any wrong in the code? or it’s normal to get an index size bigger than original data?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
desjobcommented, Jul 16, 2019

I would say that it is expected that the index size is always larger as the original data, because the index will duplicate data to be able to search fast.

Think of it as adding a traditional index to an existing book: for each word that occurs in the book, you add a list on which pages it occurs. Now your book is several pages thicker!

0reactions
tareefdevcommented, Aug 2, 2019

This is a really nice option, thanks for the great work.

I want to emphasize again on that particular use case, when using very basic functionalities from FlexSearch; (“strict” tokenizer, non-contextual, split by words). In this case, I don’t see any difference (in performance and data size) between:

  • Building an index and load it with a copy of my data
  • Just loading the data and then using a framework (React) to filter and display it.

What benefits of using FlexSearch here?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is it bad to have index space larger than data space?
I don't usually measure it in terms of size - I usually think of it in terms of index quantity, but size would...
Read more >
What to Do When the Index is Larger Than the SQL Table
There are several causes that increase the size of indexes. Too many indexes in the same columns. First of all, analyze your indexes....
Read more >
Why are your indexes larger than your actual data?
1) Too many indexes · 2) Indexes on big columns – like varchar(255) · 3) Redundant or duplicate indexes · 4) Combination of...
Read more >
what to do if index size larger than data size - MSDN - Microsoft
what to do if index size larger than data size here is a big table, say 80,000,000 rows, with six or seven indexes,...
Read more >
Indexing Very Large Tables - Towards Data Science
Creating and maintaining an index on a huge table is costlier than on smaller tables. Whenever you create an index, a copy of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found