Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AnchorText- IndexError (dimension mismatch)

See original GitHub issue

Hi, I’m getting this error IndexError: boolean index did not match indexed array along dimension 0; dimension is 100 but corresponding boolean dimension is 1

Any help would be greatly appreciated!!

Here’s the full stack trace.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-64-b6e19739979d> in <module>
      1 np.random.seed(0)
----> 2 explanation = explainer.explain(text, threshold=0.95, use_unk=True)

/usr/local/lib/python3.6/dist-packages/alibi/explainers/anchor_text.py in explain(self, text, use_unk, use_similarity_proba, sample_proba, top_n, temperature, threshold, delta, tau, batch_size, coverage_samples, beam_size, stop_on_first, max_anchor_size, min_samples_start, n_covered_ex, binary_cache_size, cache_margin, verbose, verbose_every, **kwargs)
    562             verbose=verbose,
    563             verbose_every=verbose_every,
--> 564             **kwargs,
    565         )  # type: Any
    566         result['names'] = [self.words[x] for x in result['feature']]

/usr/local/lib/python3.6/dist-packages/alibi/explainers/anchor_base.py in anchor_beam(self, delta, epsilon, desired_confidence, beam_size, epsilon_stop, min_samples_start, max_anchor_size, stop_on_first, batch_size, coverage_samples, verbose, verbose_every, **kwargs)
    666 
    667         # sample by default 1 or min_samples_start more random value(s)
--> 668         (pos,), (total,) = self.draw_samples([()], min_samples_start)
    669 
    670         # mean = fraction of labels sampled data that equals the label of the instance to be explained, ...

/usr/local/lib/python3.6/dist-packages/alibi/explainers/anchor_base.py in draw_samples(self, anchors, batch_size)
    355         sample_stats, pos, total = [], (), ()  # type: List, Tuple, Tuple
    356         samples_iter = [self.sample_fcn((i, tuple(self.state['t_order'][anchor])), num_samples=batch_size)
--> 357                         for i, anchor in enumerate(anchors)]
    358         for samples, anchor in zip(samples_iter, anchors):
    359             covered_true, covered_false, labels, *additionals, _ = samples

/usr/local/lib/python3.6/dist-packages/alibi/explainers/anchor_base.py in <listcomp>(.0)
    355         sample_stats, pos, total = [], (), ()  # type: List, Tuple, Tuple
    356         samples_iter = [self.sample_fcn((i, tuple(self.state['t_order'][anchor])), num_samples=batch_size)
--> 357                         for i, anchor in enumerate(anchors)]
    358         for samples, anchor in zip(samples_iter, anchors):
    359             covered_true, covered_false, labels, *additionals, _ = samples

/usr/local/lib/python3.6/dist-packages/alibi/explainers/anchor_text.py in sampler(self, anchor, num_samples, compute_labels)
    176         if compute_labels:
    177             labels = self.compare_labels(raw_data)
--> 178             covered_true = raw_data[labels][:self.n_covered_ex]
    179             covered_false = raw_data[np.logical_not(labels)][:self.n_covered_ex]
    180             # coverage set to -1.0 as we can't compute 'true'coverage for this model

IndexError: boolean index did not match indexed array along dimension 0; dimension is 100 but corresponding boolean dimension is 1```

Issue Analytics

State:
Created 3 years ago
Comments:24 (11 by maintainers)

Top GitHub Comments

1reaction

tiru1930commented, Aug 23, 2021

@jklaise stub.zip

It needs fasttext , spacy and pandas, alibi

0reactions

RobertSamoilescucommented, Aug 24, 2021

@tiru1930, the main problem is the way you define the prediction function. The internals of Anchor expects the prediction function to be able to predict batches of instances. I believe that in your example your function was expecting a single instance to be classified. Note that the output is expected to be an np.ndarray and it does not matter whether you are predicting the label or the distribution as in the latter case the argmax operator is applied along the axis=1. The second issue is more related to the prediction function of a fasttext model. It seems like the prediction labels are permuted to be in decreasing order of their likelihood. Thus, the most probable class is always placed at the beginning of the prediction output. Taking argmax will always predict label 0. Please check the official documentation to confirm that.

Check the following example


import pandas as pd
import fasttext as ft
from alibi.utils.download import spacy_model
from alibi.explainers import AnchorText
import spacy
import numpy as np

# read dataset and rename category column to label
df = pd.read_csv("test.csv")
df = df.rename(columns={"category": "label"})
df.head()

# add "__label__" prefix for each class label
df["label"] = df["label"].apply(lambda x: "__label__" + x)
df.to_csv("ft_train.csv", sep="\t", index=False, header=False)
print("# of unique labels: %d" % len(df.label.unique()))

# fit the model (you may play a bit with the hyperparams)
model = ft.train_supervised(input="ft_train.csv", lr=1.0, epoch=25, wordNgrams=2)
print(model.test("ft_train.csv"))

# select example to be explained
text = df.text.iloc[100]
label = df.label.iloc[100]
print("Text:", text, "\tLabel:", label)

# load spacy model
smodel = 'en_core_web_md'
spacy_model(model=smodel)
nlp = spacy.load(smodel)

# define the prediction function
predict_fn = lambda x: np.array([model.labels.index(cls[0]) for cls in model.predict(x)[0]])

# define explainer
explainer = AnchorText(
    predictor=predict_fn,
    sampling_strategy='unknown',
    nlp=nlp,
    seed=0
)

# explain the example
explanation = explainer.explain(text, threshold=0.95)

# print explanation
pred = model.labels[predict_fn([text])[0]]
print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))

Top Results From Across the Web

Dimension mismatch error in python while evaluating a ...

The function evaluates if I input a pair of arguments. But when I try to evaluate the function at grid points, it fails,...

autotest.client.shared.jsontemplate — autotest 0.16.3-44-g0d527f ...

Error : to catch all exceptions thrown by this module. ... something else try: splitchar = user_str[i] except IndexError: args = () #...

Seriously Simple Podcasting – WordPress 插件 | WordPress.org ...

Automatically sync Castos accounts with Seriously Simple Podcasting in a few clicks. Import RSS feeds from any podcast hosting provider using Seriously Simple ......

OmniFind Yahoo! Edition: Message Reference - FTP Directory Listing

Many of the search engine messages appear in a chain of other messages. For example, you might see an error about a crawler...

What is Anchor Text? Best Practices for Optimizing Link Text

Anchor text is the text that appears in a link. Learn how anchor text works and how to optimize it for SEO.