Summaries to include in report.md generators
See original GitHub issueThe report generators in supervised_train.py
and supervised_param_train.py
are great! They make it much easier to browse results of the numerical experiments @yuanqing-wang has been doing.
A wishlist for things that would be good to include in the future iterations of the report generator:
- A few other quick summaries that may be useful to add are variance(target), stddev(target), and mean_absolute_error. For example, to compare with MoleculeNet benchmark results on QM9 energy regression task, it would be useful to have MAE. To put the RMSE in context, it would be good to know what is the standard deviation of the target value.
- In model summary section, we have a lot of important detail about layer sizes, etc. Could we also add description of how node, edge, … etc. features are initialized? (Currently says only the input dimension.) Here would also be good to describe the loss function in more detail. The description mentions that
loss_fn=mse_loss
, but @yuanqing-wang mentions by Slack that this loss is measured on a normalized regression target. - For R^2, could you include the definition used, perhaps in a footnote? The reported values are often negative, and I think it is using the definition
1 - (residual sum of squares) / (total sum of squares)
, as in sklearn.metric.r2_score, but a reader might reasonably expect one of the other definitions that leads to a non-negative value. - For R^2, often the value reported is rounded to 1.00. We might need to use more digits of precision here.
- Another plot that may be informative to look at is a scatter plot of predictions and targets. (So we can see what is the variance of the target quantity, if there are just a few outliers that are dominating the RMSE summary, etc.).
- The plots should have axes labeled. In some cases the x-axis is number of optimizer steps, and in some cases the number of epochs. In some cases I think the y-axis is in units of kcal/mol, and in some cases it is measuring error on regression target normalized to have mean 0 and variance 1.
- In some reports, the final iterate is much worse than the best iterate. For example, in this report, an RMSE of ~5-10 (kcal/mol?) and R^2 of ~1 are attained after 60 epochs, but then the optimizer decided to go way uphill and never come back, and the report includes a table that says the model obtained an RMSE of 150 (kcal/mol?) and R^2 of 0.25. Since we’re using an optimizer that doesn’t always step in descent directions, could we also add to the summary a description of the best iterate encountered, in addition to the currently summarized last iterate?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Summary and Overview Revisions to Maryland's Hazardous ...
A. Introduction. On May 3, 2021, major revisions to Maryland's regulations for generators of hazardous waste became effective. This action incorporated into ...
Read more >Report Generator Manual - AVSS
To make use of this information, a report generator has been developed to produce form letters, listings, cross-tabulations, and summary reports.
Read more >Report Generator Add-On - DryFire USA: Target Simulators
The Report Generator add-on uses the stored information from the scorecard and produces a detailed analysis of each individual shooter's performance, ...
Read more >Standalone mochawesome report generator. Just add test data.
GitHub - adamgruber/mochawesome-report-generator: Standalone mochawesome report generator. ... README.md ... Tell mocha to use the Mochawesome reporter:.
Read more >Introduction to Generators - Environmental Protection Agency
Regulatory Summary. ... 2.7 Reporting and Recordkeeping . ... Generators have several recordkeeping and reporting responsibilities in Subpart D of Part 262.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Nice! Looks like something in this direction may be an improvement: would separate the computation of summary statistics from the generation of formatted reports, which are currently intertwined.
A couple minor comments:
results
dictionaries of specific structure that depends on the result type, hinting that these may be better to live inside a results class (results.save_html()
,multiple_results_object.save_html()
,multiple_results_object.save_html(grid=True)
, …, rather thanhtml(results_dict)
,html_multiple_train_and_test(results)
,html_multiple_train_and_test_2d_grid(results)
…)