Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[🐛BUG] Implausible metrics?

See original GitHub issue

Trying out my implementation of SLIM with ElasticNet https://github.com/RUCAIBox/RecBole/pull/621 I’m noticing some implausible numbers. Dataset is ml-100k with all defaults. Using default hyperparameters of my method defined in its yaml file (not yet well-chosen because these results are so off) https://github.com/RUCAIBox/RecBole/blob/41a06e59ab26482dbfac641caac99876c167168c/recbole/properties/model/SLIMElastic.yaml

Using this standard copy-pasted code

dataset_name = "ml-100k"

model = SLIMElastic

config = Config(model=model, dataset=dataset_name)
init_seed(config['seed'], config['reproducibility'])

# logger initialization
init_logger(config)
logger = getLogger()

logger.info(config)

# dataset filtering
dataset = create_dataset(config)
logger.info(dataset)

# dataset splitting
train_data, valid_data, test_data = data_preparation(config, dataset)

# model loading and initialization
model = model(config, train_data).to(config['device'])
logger.info(model)

# trainer loading and initialization
trainer = Trainer(config, model)

# model training
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)

# model evaluation
test_result = trainer.evaluate(test_data)

logger.info('best valid result: {}'.format(best_valid_result))
logger.info('test result: {}'.format(test_result))

Results: INFO test result: {'recall@10': 0.8461, 'mrr@10': 0.5374, 'ndcg@10': 0.7102, 'hit@10': 1.0, 'precision@10': 0.6309}

Also, my HyperOpt log is highly suspicious

alpha:0.316482837679784, hide_item:False, l1_ratio:0.9890017268444972, positive_only:False
Valid result:
recall@10 : 0.8461    mrr@10 : 0.5368    ndcg@10 : 0.7099    hit@10 : 1.0000    precision@10 : 0.6309    
Test result:
recall@10 : 0.8461    mrr@10 : 0.5374    ndcg@10 : 0.7102    hit@10 : 1.0000    precision@10 : 0.6309

...

alpha:0.47984629320482386, hide_item:False, l1_ratio:0.9907136437218732, positive_only:True
Valid result:
recall@10 : 0.8461    mrr@10 : 0.5368    ndcg@10 : 0.7099    hit@10 : 1.0000    precision@10 : 0.6309    
Test result:
recall@10 : 0.8461    mrr@10 : 0.5374    ndcg@10 : 0.7102    hit@10 : 1.0000    precision@10 : 0.6309

...

alpha:0.9530393537754144, hide_item:True, l1_ratio:0.24064058250190196, positive_only:True
Valid result:
recall@10 : 0.6251    mrr@10 : 0.3611    ndcg@10 : 0.4954    hit@10 : 0.9650    precision@10 : 0.4709    
Test result:
recall@10 : 0.6535    mrr@10 : 0.4012    ndcg@10 : 0.5357    hit@10 : 0.9745    precision@10 : 0.5019

Exact same results with different parameters?

I figure if there is a mistake in my implementation it would cause bad performance, not amazing performance.

Anyone know what could be causing this?

Issue Analytics

State:
Created 3 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

tsotfskcommented, Dec 28, 2020

Hi! @deklanw. 😊 your code looks fine, and I tested the HyperOpt module after adding noise and it also works well. The phenomenon of implausible metrics will disappear and the effect of noise on the results is very small and can be ignored.

0reactions

deklanwcommented, Dec 28, 2020

Thanks for the help

Top Results From Across the Web

How to Write Good Bug Reports And Gather Quality Metrics ...

Learn what data you need to include in a good bug report. Also, learn what additional data to add to be able later...

Software Engineering Metrics: What Do They Measure and ...

Thus, all metrics should be validated. The paper continues with a framework for evaluating proposed metrics, and applies it to two uses of...

Feature-Oriented Defect Prediction: Scenarios, Metrics, ...

We explore the feasibility and solution space for feature-oriented defect prediction. We design and investigate scenarios, metrics, and.

Metrics used as quality indicators | Download Table

The aim of this paper is to identify the performance characteristics of the bug fixing process in open source bug repositories. Various performance...

severity-guidelines.md - chromium/chromium - Sourcegraph

Requiring profile destruction or browser shutdown will normally reduce severity by one level. Bugs that require implausible interaction, interactions a user ...