Easy access to citations for specific methods
See original GitHub issueHello,
As a scientist, I would appreciate an easy access to citation information for a specific estimator. A very good example would be the citation function from R https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/citation That provides citation for each packages (and even better in bibtex format !).
The citation information in sklearn is more or less available in the reference sections of the doctrings and i coded a short function that basically prints the section:
import re
import textwrap
def _get_docstring_sections(txt):
"""Returns the sections content of numpy formated docstrings as dict."""
if not type(txt)==str:
txt=''
# remove docstring indentation
txt=textwrap.dedent(' '+txt)
# split the docstring by sections (------)
split_sec=re.split('\n--*\n',txt)
# Find fisrt title
secs={}
lines=split_sec[0].splitlines()
if len(lines):
title= lines[-1]
for i in range(1,len(split_sec)):
lines=split_sec[i].splitlines()
secs[title]='\n'.join(lines[:-1])
title=lines[-1]
return secs
def cite(obj=None,format=None):
"""Get citation information from object with numpy docstrings
"""
txt="""No references found for this object"""
if obj is not None:
secs=_get_docstring_sections(obj.__doc__)
if 'References' in secs:
txt= secs['References']
print(txt)
It seems to work rather well in practice and print the references in rst format. in a perfect world we could also return bibtex format in option but it seems quite hard without calling some reference web API.
I used it on all estimators using function sklearn.utils.testing.all_estimators()
:
lst_all=sklearn.utils.testing.all_estimators()
for name,cl in lst_all:
fmt='\n\n{} : {}'.format(name,cl)
print(fmt)
print('='*(len(fmt)-2))
cite(cl)
It returns a lot of nice references for the methods with some notable empty approaches actually due to documentation error (References in sklearn.svm.SVR for instance in not a proper section). It also work on any python object using the numpy docstring format with reference section so I find it quite practical.
Are you interested by this function in a PR maybe in sklearn.utils ?
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (7 by maintainers)
Top GitHub Comments
Just found out about this https://github.com/duecredit/duecredit
OK, I’m all for better documentation. I’m just saying that a lots of publication nowadays cite sklearn (one of the main papers) instead of the proper paper corresponding to the estimator when they should cite both. This might be lazyness but that’s why a simple function that prints a list of reference could help, assuming of course they know about it.
When there is several references, the function returns them all and the user can look into the documentation which one corresponds to the parameters he is using.
Anyways it was just an idea. I think I will implement it for my toolbox and see if i can find a way to add those awesome bibtex that the guys from R have. The way it is implemented it works on functions from numpy/scipy/sklearn and my toolbox as long as it’s using numpy docstring format.