question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Strange sparsity results

See original GitHub issue

Hi!

I’ve noticed some potentially wrong sparisty (L0/Distance_1) results due to some very small numbers.

When running Carla’s benchmarking code for the cem-vae method and the first 10 test observations:

from carla.data.catalog import OnlineCatalog
from carla.models.catalog import MLModelCatalog
from carla.models.negative_instances import predict_negative_instances
from carla import Benchmark

import carla.recourse_methods.catalog as recourse_catalog

import torch

dataset = OnlineCatalog("adult")

torch.manual_seed(0)
n_test = 10
ml_model = MLModelCatalog(
        dataset, 
        model_type="ann", 
        load_online=False, 
        backend="pytorch"
    )

ml_model.train(
    learning_rate=0.002,
    epochs=20,
    batch_size=1024,
    hidden_size=[18, 9, 3],
    force_train=True, 
)

hyperparams = {
    "data_name": "adult",
    "batch_size": 1,
    "kappa": 0.1,
    "init_learning_rate": 0.01,
    "binary_search_steps": 9,
    "max_iterations": 100,
    "initial_const": 10,
    "beta": 0.9,
    "gamma": 1.0, # 0.0, #   1.0
    "mode": "PN",
    "num_classes": 2,
    "ae_params": {"hidden_layer": [20, 10, 7], "train_ae": True, "epochs": 5},
}

from tensorflow import Graph, Session

graph = Graph()
with graph.as_default():
    ann_sess = Session()
    with ann_sess.as_default():
        ml_model_sess = MLModelCatalog(dataset, "ann", "tensorflow")

        factuals_sess = predict_negative_instances(
            ml_model_sess, dataset.df
        )
        factuals_sess = factuals_sess.iloc[:n_test].reset_index(drop=True)

        cem = recourse_catalog.CEM(ann_sess, ml_model_sess, hyperparams)
        df_cfs = cem.get_counterfactuals(factuals_sess)
        benchmark = Benchmark(ml_model, cem, factuals_sess)

distances = benchmark.compute_distances()

distances.Distance_1[0] # equal to 5

I get that the first sparsity/Distance_1 is equal to 5. When printing our the factual and counterfactual for this test observation, I get that the two vectors are almost the same (the only difference is ‘capital-gain’).

image

The reason for this problem is that the distance_1 code looks something like this

import numpy as np

arr_f = ml_model.get_ordered_features(benchmark._factuals).to_numpy()
arr_cf = ml_model.get_ordered_features(
    benchmark._counterfactuals
).to_numpy()

delta = arr_f - arr_cf

d1 = np.sum(delta != 0, axis=1, dtype=np.float).tolist()

For the first observation, delta (the difference between the factual and the counterfactual) has really small (but not zero) numbers:

image

Which leads to a wrong calculation of d1.

Any suggestions on how to fix this delta/rounding problem?

Thanks!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
JohanvandenHeuvelcommented, May 5, 2022

Yeah that sounds good.

0reactions
JohanvandenHeuvelcommented, May 10, 2022

That’s no problem at all!

Read more comments on GitHub >

github_iconTop Results From Across the Web

concat() on Sparse dataframe returns strange results #12174
I open a stackoverflow question here : http://stackoverflow.com/questions/35083277/pandas-concat-on-sparse-dataframes-a-mystery And someone ask me to open ...
Read more >
numpy.square returns incorrect result for sparse matrices
In general, passing in scipy.sparse matrices into numpy functions that take arrays ("array_like") as input, results to undefined/unintended behavior.
Read more >
Sparsity May Cry: Let Us Fail (Current) Sparse Neural ...
This paper provides a new benchmark and results for evaluating sparsity methods on diverse tasks called SMC-Bench.
Read more >
Explorability and the origin of network sparsity in living systems
We show that sparsity is an emergent property resulting from optimising both explorability and dynamical robustness, i.e. the capacity of the ...
Read more >
A strange result of sparse matrix addition with mol_sparse_s_add ...
Hi all, I have a new question about sparse matrix addition routine mkl_sparse_s_add, it return a strange result. And when I using double...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found