question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IndexError using impute_new_data

See original GitHub issue

I am trying to impute new data using the kernel.

from datetime import datetime

start_t = datetime.now()
new_data_imputed = kernel.impute_new_data(new_data=new_sub)
print(f"New Data imputed in {(datetime.now() - start_t).total_seconds()} seconds")

But, I keep getting an IndexError:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/36/j_203fcj42q9bvnlt1sl3j640000gp/T/ipykernel_54504/3730750411.py in <module>
      2 
      3 start_t = datetime.now()
----> 4 new_data_imputed = kernel.impute_new_data(new_data=new_sub)
      5 print(f"New Data imputed in {(datetime.now() - start_t).total_seconds()} seconds")

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/miceforest/ImputationKernel.py in impute_new_data(self, new_data, datasets, iterations, save_all_iterations, copy_data, random_state, verbose)
   1233                         )
   1234                     )
-> 1235                     imputed_data._insert_new_data(
   1236                         dataset=ds, variable_index=var, new_data=imp_values
   1237                     )

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/miceforest/ImputedData.py in _insert_new_data(self, dataset, variable_index, new_data)
    387         view = _slice(self.working_data, col_slice=variable_index)
    388         if view.dtype.name == "category":
--> 389             new_data = np.array(view.cat.categories)[new_data]
    390 
    391         _assign_col_values_without_copy(

IndexError: index 1 is out of bounds for axis 0 with size 1

Shape of original data: (33008, 71) Shape of new_sub: (15, 71) Both datasets have columns that are all of the same data type. What could be causing this issue?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
AnotherSamWilsoncommented, Dec 20, 2021

You can avoid this by setting the datatypes in the new category columns equal to the category types in the original data. Running this should solve it:

for col in data.columns:
  new_sub[col] = new_sub[col].astype(data[col].dtype)

This will ensure the categories are recognized, even if they do not exist in the new data.

0reactions
KaikeWesleyReiscommented, Jul 26, 2022

@AnotherSamWilson Just to mention a concern here: if we use mean_matching_candidates != 0 at Kernel definition, the imputation will fail category dtype columns. If this is expected, should be clear at Kernel description.

Read more comments on GitHub >

github_iconTop Results From Across the Web

IndexError: index out of range in self · Issue #5611 - GitHub
IndexError : index out of range in self. I tried to set max_words length as 400, still getting same error : Data I...
Read more >
Python IndexError: List Index Out of Range [Easy Fix] - Finxter
To solve the “IndexError: list index out of range”, avoid do not access a non-existing list index. For example, my_list[5] causes an error...
Read more >
How to catch IndexError Exception in Python? - Tutorialspoint
An IndexError is raised when a sequence reference is out of range. The given code is rewritten as follows to catch the exception...
Read more >
Pytorch: IndexError: index out of range in self. How to solve?
Any input less than zero or more than declared input dimension raise this error. Compare your input and the dimension mentioned in torch.nn....
Read more >
Python indexerror: list index out of range Solution
In this tutorial, we're going to talk about the “indexerror: list index out of range” error. We'll discuss how it works and walk...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found