Discrepancy in counterfactual indexing for CLUE generator
See original GitHub issueHello!
When generating counterfactuals using CLUE
counterfactual generator, the resulting counterfactuals in the dataframe are ordered using a RangeIndex
instead of the original indexing found in the factuals. This is a problem when the factuals’ indices are not ordered as a range index, e.x. by using .sample(n)
. This can be seen by running this code:
data_name = "compas"
model = MLModelCatalog(dataset, "ann", backend="pytorch")
model.train(...)
hyperparams = {...}
cl = Clue(dataset, model, hyperparams)
wa = Wachter(model, {...})
factuals = predict_negative_instances(model, dataset._df).sample(10)
cl_counterfactuals = cl.get_counterfactuals(factuals)
wa_counterfactuals = wa.get_counterfactuals(factuals)
display(factuals.index)
display(cl_counterfactuals.index)
display(wa_counterfactuals.index)
This yields:
Int64Index([4886, 4389, 2317, 797, 3154, 4685, 956, 3014, 99, 510], dtype='int64')
RangeIndex(start=0, stop=10, step=1)
Int64Index([4886, 4389, 2317, 797, 3154, 4685, 956, 3014, 99, 510], dtype='int64')
Now, if we try to benchmark the CLUE counterfactual generator:
benchmark = Benchmark(model, cl, factuals)
benchmark.run_benchmark()
we get:
ValueError: Can only compare identically-labeled DataFrame objects
Stemming from the constraint_violation
check in carla\evaluation\violations.py
The counterfactuals get reordered in counterfactuals.check_counterfactuals
for CLUE
, FACE
, GrowingSpheres
and REVISE
since they all pass in a list of counterfactuals as opposed to a pandas dataframe, so at least those methods can have problems with that.
One way of fixing this issue would be to pass an index
attribute to check_counterfactuals
and add a index=indices
on line 32
to set the proper indices. Other would be to simply do cfs_df.index = factuals.index
after the function is called to reorder the indices.
Since the counterfactuals now get new indices, and both dataframes are of the same size, restoring the ones from factuals shouldn’t cause any further problems I can spot and will only improve the consistency across the recourse methods.
If any of the proposed fixes are acceptable, or there are other ways to fix the issue, I would be happy to perform them and be assigned to this issue. 😃
Issue Analytics
- State:
- Created a year ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
Since my commit doesn’t change these files, I skipped the
pre-commit install
step. The error will still appear if one tries to push with pre-commit. It also happened before I made changes so I guess these files have to be fixed.That would be great! I tried replicating the problem, but for me it works. So it’s a bit difficult for me to test if something fixes it. I’m mainly wondering why it doesn’t work for you but does for me.