question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unify API for loader functions

See original GitHub issue

The molnet loader functions are currently divided into two groups with inconsistent APIs for specifying featurizers and splitters.

Most of them have arguments featurizer and split to specify them. These arguments take one of a set of hardcoded strings, like featurizer='ECFP' or split='random'. There are a few problems here.

  1. The list of accepted values is undocumented.
  2. There’s also no way to discover them programmatically.
  3. There’s no way to specify a splitter or featurizer that isn’t on the list.
  4. The list of allowed options varies widely between datasets. If there’s a coherent set of rules behind them, I don’t know what it is.

Then there are the ones that use the template introduced in #1938. These work rather differently. The argument for specifying the splitter is called splitter instead of split. The arguments may take either the name of a class, the class itself, or an instance of the class. These functions have their own set of issues.

  1. The list of accepted values is again undocumented.
  2. It is possible to discover them programmatically (for example from dc.molnet.load_function.zinc15_datasets.zinc15_splitters), but the mechanism is itself undocumented.
  3. Much of the documentation is incorrect. For example,

https://github.com/deepchem/deepchem/blob/ea0fe592fe4228109ea955c97c9d6e06c280e5a1/deepchem/molnet/load_function/zinc15_datasets.py#L91-L92

In fact the value should specify a single splitter, not “allowed splitters”. And it never mentions the possibility of passing a string or class.

  1. It isn’t clear to me that there’s really a lot of benefit from having so many options. featurizer='CircularFeaturizer' isn’t substantially clearer, shorter, or more convenient than featurizer=dc.feat.CircularFeaturizer().

We should come up with a single consistent API for all of them.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:35 (35 by maintainers)

github_iconTop GitHub Comments

1reaction
peastmancommented, Oct 12, 2020

How about this design:

  1. All loader functions will work the same way.
  2. The featurizer will be specified with featurizer. It can take either a Featurizer object or one of the special names.
  3. The splitter will be specified with splitter. I can take either a Splitter object or one of the special names.
  4. split will be accepted as a deprecated synonym for splitter.
  5. The default splitter will always be the one that’s recommended in the docstring.
  6. Because you can pass arbitrary objects, there is no longer a fixed list of supported featurizers and splitters for each one. We should, however, document the recommended choices.

I can do 1-5. The creators and maintainers of particular datasets will need to do 6.

0reactions
ncfreycommented, Nov 20, 2020

I’ll take a crack at overhauling load_pdbbind so I can include it in the new tutorial on predicting protein-ligand binding with the new interaction fingerprints.

Read more comments on GitHub >

github_iconTop Results From Across the Web

UniFi API browser tool: updates and discussion
The tool allows you to easily browse/explore a lot of the data (and I will be adding more over time) that is accessible...
Read more >
UniFi-API-client/README.md at master - GitHub
A PHP API client class to interact with Ubiquiti's UniFi Controller API ... UniFi OS-based controllers and adjusts URLs and several functions/methods ...
Read more >
The Quest To Unify With GraphQL, The Modern Data API
There's an emerging trend toward using GraphQL as an integration and aggregation layer for multiple APIs. Will it work?
Read more >
Best Unified APIs Software in 2022 - G2
Unified APIs (application programming interfaces), also called universal or normalized APIs, provide a single “meta" API that is used to ...
Read more >
What is a Unified API? - Apideck Blog
A Unified API aggregates many APIs in the same software category, making integration easier with a standard endpoint, authentication, and normalized data.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found