Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Questions on best practices

See original GitHub issue

In beginning to write methods, a number of questions came to mind:

How hesitant should we be to pollute our dev/build environment with dependencies needed to validate methods? statsmodels is a relatively lightweight example where I’d like to be able to verify regression statistics with it, but it’s not particularly well maintained and might introduce some issues down the road with relying on older versions of scipy/numpy/etc. Hail and scikit-allele are other examples where I would like to demonstrate equivalency for certain things and while showing this is really only necessary at first, it would be nice if there was a way to make that kind of verification recurrent. A few possible approaches could be:
- Make any validation relying on packages that do something at a higher level than array ops (e.g. anything w/ statistics or genetics) part of integration (or “validation”) test process with separate environments
- Export data from software like this and use only the data in unit tests. This is what I usually do but typically without much documentation/reproducibility – I’d rather have a good place to document/share that process. Maybe an sgkit-validation repo would be a good place to organize/document these scripts with messy environments?
- Do nothing. I haven’t seen this done much in other projects so perhaps we don’t make it a priority. The stringency of our tests will likely slip but it may be worth avoiding all the extra complexity.
Should we add readthedocs integration now? I’d like to not have to reformat all my docstrings later and it would be nice to know what conventions we want to follow.
Is there a preferred way to organize publication references within a scientific codebase? I started adding Chicago style citations in a docstring “References” block, but I’m not sure what the best practice for this is. Scikit-learn is probably a good model to follow, though I don’t know how they organize references (e.g. orthogonal-matching-pursuit-omp). Seems to be a combination of “Notes” in docstrings and separate references sections in .rst docs
Are we taking a hard line on single vs double string quoting? I prefer single quotes, but I’m happy to try to remember to stick with one or the other.

Issue Analytics

State:
Created 3 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

dhimmelcommented, Jul 7, 2020

Is there a preferred way to organize publication references within a scientific codebase?

I don’t think there are well adopted standards here. IMO a title and URL are sufficient. For example, if your docstrings support markdown you could do:

- [SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives](https://arxiv.org/abs/1407.0202)

Most things looking through the source code or auto-generated docs are either humans (which can look at the title and follow the link) or search engines (which care about the link). Also you’ll likely have many references that aren’t to traditional publications, like GitHub Issues or StackExchange comments. Styles like Chicago are a bit outdated when it comes to representing online forum posts.

For actual references in a publication, I like Manubot’s reference style (example), but I’m partial since I helped design it. But in code, I generally just do title and URL, possibly with first author last name and year of publication if those details are helpful.

1reaction

jeromekellehercommented, Jul 6, 2020

The unit-vs-validation testing distinction is tricky. How do we do unit tests if we can’t check the answer? But, on the other hand, once you start checking statistical outputs in the unit tests, it’s very difficult to avoid the unit test suite becoming slow. Maybe the thing to do is keep the unit tests very, very simple, and keep all checking of statistical correctness in the validation tests? This would make it much easier for regressions to slip through, though.

I think there must be some checking for statistical correctness in the unit tests, and having dependencies on a small number of external packages is fine for this. For population genetics methods, I think we can use tskit, scikit-allel and maybe pylibseq to validate against. Tskit and scikit-allel are straightforward dependencies. We can generate test data for popgen stuff using msprime, which is also a well behaved dependency.

Top Results From Across the Web

Best Practices: Example Interview Questions

Leadership Values & Expectations. Integrity. Be honest, respectful, and just. • Describe a time when your integrity was questioned. What did you do?...

8 Questions to Ask to Determine the Best Practices You Need ...

What is your business strategy? · Is what you do in your work aligned with that business strategy, and is everyone in the...

Best Practices 101: Ask the Right Questions - Procore

From Right Questions to Better Questions · “What are the best practices for recommending materials and methods?” · “What are the best practices...

Best Practices - Science topic - ResearchGate

Review and cite BEST PRACTICES protocol, troubleshooting and other methodology information | Contact experts in BEST PRACTICES to get answers.

Best Practice: Effective Discussion Questions - Blackboard Help

Ideas for guiding questions · What assumptions do you hold about teaching effectiveness? · How would you assess an instructor's performance? · Identify...