Questions on best practices
See original GitHub issueIn beginning to write methods, a number of questions came to mind:
- How hesitant should we be to pollute our dev/build environment with dependencies needed to validate methods?
statsmodels
is a relatively lightweight example where I’d like to be able to verify regression statistics with it, but it’s not particularly well maintained and might introduce some issues down the road with relying on older versions of scipy/numpy/etc. Hail and scikit-allele are other examples where I would like to demonstrate equivalency for certain things and while showing this is really only necessary at first, it would be nice if there was a way to make that kind of verification recurrent. A few possible approaches could be:- Make any validation relying on packages that do something at a higher level than array ops (e.g. anything w/ statistics or genetics) part of integration (or “validation”) test process with separate environments
- Export data from software like this and use only the data in unit tests. This is what I usually do but typically without much documentation/reproducibility – I’d rather have a good place to document/share that process. Maybe an
sgkit-validation
repo would be a good place to organize/document these scripts with messy environments? - Do nothing. I haven’t seen this done much in other projects so perhaps we don’t make it a priority. The stringency of our tests will likely slip but it may be worth avoiding all the extra complexity.
- Should we add readthedocs integration now? I’d like to not have to reformat all my docstrings later and it would be nice to know what conventions we want to follow.
- Is there a preferred way to organize publication references within a scientific codebase? I started adding Chicago style citations in a docstring “References” block, but I’m not sure what the best practice for this is. Scikit-learn is probably a good model to follow, though I don’t know how they organize references (e.g. orthogonal-matching-pursuit-omp). Seems to be a combination of “Notes” in docstrings and separate references sections in .rst docs
- Are we taking a hard line on single vs double string quoting? I prefer single quotes, but I’m happy to try to remember to stick with one or the other.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Best Practices: Example Interview Questions
Leadership Values & Expectations. Integrity. Be honest, respectful, and just. • Describe a time when your integrity was questioned. What did you do?...
Read more >8 Questions to Ask to Determine the Best Practices You Need ...
What is your business strategy? · Is what you do in your work aligned with that business strategy, and is everyone in the...
Read more >Best Practices 101: Ask the Right Questions - Procore
From Right Questions to Better Questions · “What are the best practices for recommending materials and methods?” · “What are the best practices...
Read more >Best Practices - Science topic - ResearchGate
Review and cite BEST PRACTICES protocol, troubleshooting and other methodology information | Contact experts in BEST PRACTICES to get answers.
Read more >Best Practice: Effective Discussion Questions - Blackboard Help
Ideas for guiding questions · What assumptions do you hold about teaching effectiveness? · How would you assess an instructor's performance? · Identify...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I don’t think there are well adopted standards here. IMO a title and URL are sufficient. For example, if your docstrings support markdown you could do:
Most things looking through the source code or auto-generated docs are either humans (which can look at the title and follow the link) or search engines (which care about the link). Also you’ll likely have many references that aren’t to traditional publications, like GitHub Issues or StackExchange comments. Styles like Chicago are a bit outdated when it comes to representing online forum posts.
For actual references in a publication, I like Manubot’s reference style (example), but I’m partial since I helped design it. But in code, I generally just do title and URL, possibly with first author last name and year of publication if those details are helpful.
The unit-vs-validation testing distinction is tricky. How do we do unit tests if we can’t check the answer? But, on the other hand, once you start checking statistical outputs in the unit tests, it’s very difficult to avoid the unit test suite becoming slow. Maybe the thing to do is keep the unit tests very, very simple, and keep all checking of statistical correctness in the validation tests? This would make it much easier for regressions to slip through, though.
I think there must be some checking for statistical correctness in the unit tests, and having dependencies on a small number of external packages is fine for this. For population genetics methods, I think we can use tskit, scikit-allel and maybe pylibseq to validate against. Tskit and scikit-allel are straightforward dependencies. We can generate test data for popgen stuff using msprime, which is also a well behaved dependency.