question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors featurizing PDBBind with RdkitGridFeaturizer

See original GitHub issue

I’m trying to run this script from the book examples, which uses RdkitGridFeaturizer to featurize PDBBind. Here’s what I find.

  • With the latest code, using the core subset and featurizing only the binding pockets, it is much faster than before (minutes instead of hours).

  • While running, it fills the console with a huge number of lines like these:

Coordinates are outside of the box (atom id = 7, coords xyz = [-3.97537037  0.55992593 -8.72777778], coords in box = [ 2  4 -1]
Coordinates are outside of the box (atom id = 105, coords xyz = [-1.53037037 -1.94707407 -8.81577778], coords in box = [ 3  3 -1]
Coordinates are outside of the box (atom id = 106, coords xyz = [-1.41437037 -0.88707407 -9.36477778], coords in box = [ 3  3 -1]

I suspect there’s one line for every atom that isn’t in the binding pocket?

  • It fails with this error:
/home/peastman/miniconda3/envs/tf2/lib/python3.8/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
Traceback (most recent call last):
  File "pdbbind_nn.py", line 11, in <module>
    n_features = train_dataset.X.shape[1]
IndexError: tuple index out of range

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
peastmancommented, Mar 9, 2021

Thanks! That seems to have fixed it.

How about the warning messages about coordinates outside the box? Can we eliminate them? If I set the box width to 75 then they go away, but then it runs out of memory trying to create the model because the dataset has 1,975,467 features! And 75A is way bigger than a typical binding site. (I assume it’s in A? The documentation doesn’t say.) The whole point of only featurizing the binding site is that a lot of atoms should be outside the box, so it doesn’t make sense to warn about them.

0reactions
ncfreycommented, Mar 9, 2021

Great!

That sounds good, I’ll eliminate the warning messages.

Read more comments on GitHub >

github_iconTop Results From Across the Web

deepchem/Lobby - Gitter
The RdkitGridFeaturizer is old and buggy code which we are in the process of refactoring. ... Please keep reporting the errors you see...
Read more >
Unable to Use Atomic Featurizer on PDBBind Dataset #1555
I am trying to load in the pdbbind dataset using the atomic featurizer. ... (featurizes core subset of pdbbind using rdkitgridfeaturizer)
Read more >
Featurizers — deepchem 2.6.2.dev documentation
This class implements the featurization to implement Duvenaud graph convolutions. Duvenaud graph convolutions [1]_ construct a vector of descriptors for each ...
Read more >
CASF - Welcome to PDBbind-CN database
(22), Boyles, F.; Deane, C. M.; Morris, G. M. Learning from the ligand: using ligand-based features to improve binding affinity prediction.
Read more >
Another interesting possibility is that neural networks can directly ...
... 24RdkitGridFeaturizer, 68building for PDBBind featurization, 77implementation ... 180errorsactual error in predictions vs. model'suncertainty estimates, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found