Strange sparsity results
See original GitHub issueHi!
I’ve noticed some potentially wrong sparisty (L0/Distance_1) results due to some very small numbers.
When running Carla’s benchmarking code for the cem-vae method and the first 10 test observations:
from carla.data.catalog import OnlineCatalog
from carla.models.catalog import MLModelCatalog
from carla.models.negative_instances import predict_negative_instances
from carla import Benchmark
import carla.recourse_methods.catalog as recourse_catalog
import torch
dataset = OnlineCatalog("adult")
torch.manual_seed(0)
n_test = 10
ml_model = MLModelCatalog(
dataset,
model_type="ann",
load_online=False,
backend="pytorch"
)
ml_model.train(
learning_rate=0.002,
epochs=20,
batch_size=1024,
hidden_size=[18, 9, 3],
force_train=True,
)
hyperparams = {
"data_name": "adult",
"batch_size": 1,
"kappa": 0.1,
"init_learning_rate": 0.01,
"binary_search_steps": 9,
"max_iterations": 100,
"initial_const": 10,
"beta": 0.9,
"gamma": 1.0, # 0.0, # 1.0
"mode": "PN",
"num_classes": 2,
"ae_params": {"hidden_layer": [20, 10, 7], "train_ae": True, "epochs": 5},
}
from tensorflow import Graph, Session
graph = Graph()
with graph.as_default():
ann_sess = Session()
with ann_sess.as_default():
ml_model_sess = MLModelCatalog(dataset, "ann", "tensorflow")
factuals_sess = predict_negative_instances(
ml_model_sess, dataset.df
)
factuals_sess = factuals_sess.iloc[:n_test].reset_index(drop=True)
cem = recourse_catalog.CEM(ann_sess, ml_model_sess, hyperparams)
df_cfs = cem.get_counterfactuals(factuals_sess)
benchmark = Benchmark(ml_model, cem, factuals_sess)
distances = benchmark.compute_distances()
distances.Distance_1[0] # equal to 5
I get that the first sparsity/Distance_1 is equal to 5. When printing our the factual and counterfactual for this test observation, I get that the two vectors are almost the same (the only difference is ‘capital-gain’).
The reason for this problem is that the distance_1 code looks something like this
import numpy as np
arr_f = ml_model.get_ordered_features(benchmark._factuals).to_numpy()
arr_cf = ml_model.get_ordered_features(
benchmark._counterfactuals
).to_numpy()
delta = arr_f - arr_cf
d1 = np.sum(delta != 0, axis=1, dtype=np.float).tolist()
For the first observation, delta (the difference between the factual and the counterfactual) has really small (but not zero) numbers:
Which leads to a wrong calculation of d1.
Any suggestions on how to fix this delta/rounding problem?
Thanks!
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
concat() on Sparse dataframe returns strange results #12174
I open a stackoverflow question here : http://stackoverflow.com/questions/35083277/pandas-concat-on-sparse-dataframes-a-mystery And someone ask me to open ...
Read more >numpy.square returns incorrect result for sparse matrices
In general, passing in scipy.sparse matrices into numpy functions that take arrays ("array_like") as input, results to undefined/unintended behavior.
Read more >Sparsity May Cry: Let Us Fail (Current) Sparse Neural ...
This paper provides a new benchmark and results for evaluating sparsity methods on diverse tasks called SMC-Bench.
Read more >Explorability and the origin of network sparsity in living systems
We show that sparsity is an emergent property resulting from optimising both explorability and dynamical robustness, i.e. the capacity of the ...
Read more >A strange result of sparse matrix addition with mol_sparse_s_add ...
Hi all, I have a new question about sparse matrix addition routine mkl_sparse_s_add, it return a strange result. And when I using double...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yeah that sounds good.
That’s no problem at all!