IndexError using impute_new_data
See original GitHub issueI am trying to impute new data using the kernel.
from datetime import datetime
start_t = datetime.now()
new_data_imputed = kernel.impute_new_data(new_data=new_sub)
print(f"New Data imputed in {(datetime.now() - start_t).total_seconds()} seconds")
But, I keep getting an IndexError:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/var/folders/36/j_203fcj42q9bvnlt1sl3j640000gp/T/ipykernel_54504/3730750411.py in <module>
2
3 start_t = datetime.now()
----> 4 new_data_imputed = kernel.impute_new_data(new_data=new_sub)
5 print(f"New Data imputed in {(datetime.now() - start_t).total_seconds()} seconds")
~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/miceforest/ImputationKernel.py in impute_new_data(self, new_data, datasets, iterations, save_all_iterations, copy_data, random_state, verbose)
1233 )
1234 )
-> 1235 imputed_data._insert_new_data(
1236 dataset=ds, variable_index=var, new_data=imp_values
1237 )
~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/miceforest/ImputedData.py in _insert_new_data(self, dataset, variable_index, new_data)
387 view = _slice(self.working_data, col_slice=variable_index)
388 if view.dtype.name == "category":
--> 389 new_data = np.array(view.cat.categories)[new_data]
390
391 _assign_col_values_without_copy(
IndexError: index 1 is out of bounds for axis 0 with size 1
Shape of original data: (33008, 71) Shape of new_sub: (15, 71) Both datasets have columns that are all of the same data type. What could be causing this issue?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
IndexError: index out of range in self · Issue #5611 - GitHub
IndexError : index out of range in self. I tried to set max_words length as 400, still getting same error : Data I...
Read more >Python IndexError: List Index Out of Range [Easy Fix] - Finxter
To solve the “IndexError: list index out of range”, avoid do not access a non-existing list index. For example, my_list[5] causes an error...
Read more >How to catch IndexError Exception in Python? - Tutorialspoint
An IndexError is raised when a sequence reference is out of range. The given code is rewritten as follows to catch the exception...
Read more >Pytorch: IndexError: index out of range in self. How to solve?
Any input less than zero or more than declared input dimension raise this error. Compare your input and the dimension mentioned in torch.nn....
Read more >Python indexerror: list index out of range Solution
In this tutorial, we're going to talk about the “indexerror: list index out of range” error. We'll discuss how it works and walk...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You can avoid this by setting the datatypes in the new category columns equal to the category types in the original data. Running this should solve it:
This will ensure the categories are recognized, even if they do not exist in the new data.
@AnotherSamWilson Just to mention a concern here: if we use
mean_matching_candidates != 0
at Kernel definition, the imputation will failcategory dtype
columns. If this is expected, should be clear at Kernel description.