question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

build_index can't handle empty numpy files

See original GitHub issue

Hello,

I’m currently running a workflow in argo which is generating several embedding files, in parallel, based on a database search. If no data was found, the workflow returns a empty numpy file:

np.save(os.path.join(output, "features", filename), np.empty(0, np.float32))

Sadly the build_index is not capable of handling those files:

Using 4 omp threads (processes), consider increasing --nb_cores if you have more
Launching the whole pipeline 04/08/2022, 09:54:53
Reading total number of vectors and dimension 04/08/2022, 09:54:53

  0%|          | 0/16 [00:00<?, ?it/s]
 19%|█▉        | 3/16 [00:00<00:00, 29.92it/s]
 56%|█████▋    | 9/16 [00:00<00:00, 87.73it/s]
>>> Finished "Reading total number of vectors and dimension" in 0.1517 secs
>>> Finished "Launching the whole pipeline" in 0.1517 secs
Traceback (most recent call last):
  File "/usr/local/bin/autofaiss", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/autofaiss/external/quantize.py", line 395, in main
    fire.Fire({"build_index": build_index, "tune_index": tune_index, "score_index": score_index})
  File "/usr/local/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/autofaiss/external/quantize.py", line 143, in build_index
    nb_vectors, vec_dim = read_total_nb_vectors_and_dim(
  File "/usr/local/lib/python3.8/site-packages/autofaiss/readers/embeddings_iterators.py", line 258, in read_total_nb_vectors_and_dim
    for c in p.imap_unordered(file_to_line_count, file_paths):
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.8/site-packages/autofaiss/readers/embeddings_iterators.py", line 252, in file_to_line_count
    return matrix_reader.get_row_count()
  File "/usr/local/lib/python3.8/site-packages/autofaiss/readers/embeddings_iterators.py", line 101, in get_row_count
    return self.get_shape()[0]

Would be great if it could handle it, by just showing a waning in the logs or a flag to allow it.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
rom1504commented, Apr 8, 2022

Given that stack trace, you’re using an outdated version of autofaiss, can you try to update it?

0reactions
sgutweincommented, Apr 8, 2022

It also fails if the file is just empty.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to create an empty numpy array and append to it, like lists
1. A Hack. You can create an empty array using the np. empty() function and specify the dimensions to be (0, 0) and...
Read more >
numpy.empty — NumPy v1.24 Manual
Return a new array of given shape and type, without initializing entries. Parameters: shapeint or tuple of int. Shape of the empty array,...
Read more >
Unsupported format or combination of formats in buildIndex ...
Unsupported format or combination of formats in buildIndex using ... it can't find features in both images, so you probably should check if ......
Read more >
How To Set Up a React Project with Create React App
npm run eject Removes this tool and copies build dependencies, configuration files and scripts into the app directory. If you do this, you...
Read more >
TTree Class Reference - ROOT
Variables of one branch are written to the same buffer. A branch buffer is automatically compressed if the file compression attribute is set...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found