Errors featurizing PDBBind with RdkitGridFeaturizer
See original GitHub issueI’m trying to run this script from the book examples, which uses RdkitGridFeaturizer to featurize PDBBind. Here’s what I find.
-
With the latest code, using the core subset and featurizing only the binding pockets, it is much faster than before (minutes instead of hours).
-
While running, it fills the console with a huge number of lines like these:
Coordinates are outside of the box (atom id = 7, coords xyz = [-3.97537037 0.55992593 -8.72777778], coords in box = [ 2 4 -1]
Coordinates are outside of the box (atom id = 105, coords xyz = [-1.53037037 -1.94707407 -8.81577778], coords in box = [ 3 3 -1]
Coordinates are outside of the box (atom id = 106, coords xyz = [-1.41437037 -0.88707407 -9.36477778], coords in box = [ 3 3 -1]
I suspect there’s one line for every atom that isn’t in the binding pocket?
- It fails with this error:
/home/peastman/miniconda3/envs/tf2/lib/python3.8/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order)
Traceback (most recent call last):
File "pdbbind_nn.py", line 11, in <module>
n_features = train_dataset.X.shape[1]
IndexError: tuple index out of range
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (16 by maintainers)
Top Results From Across the Web
deepchem/Lobby - Gitter
The RdkitGridFeaturizer is old and buggy code which we are in the process of refactoring. ... Please keep reporting the errors you see...
Read more >Unable to Use Atomic Featurizer on PDBBind Dataset #1555
I am trying to load in the pdbbind dataset using the atomic featurizer. ... (featurizes core subset of pdbbind using rdkitgridfeaturizer)
Read more >Featurizers — deepchem 2.6.2.dev documentation
This class implements the featurization to implement Duvenaud graph convolutions. Duvenaud graph convolutions [1]_ construct a vector of descriptors for each ...
Read more >CASF - Welcome to PDBbind-CN database
(22), Boyles, F.; Deane, C. M.; Morris, G. M. Learning from the ligand: using ligand-based features to improve binding affinity prediction.
Read more >Another interesting possibility is that neural networks can directly ...
... 24RdkitGridFeaturizer, 68building for PDBBind featurization, 77implementation ... 180errorsactual error in predictions vs. model'suncertainty estimates, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks! That seems to have fixed it.
How about the warning messages about coordinates outside the box? Can we eliminate them? If I set the box width to 75 then they go away, but then it runs out of memory trying to create the model because the dataset has 1,975,467 features! And 75A is way bigger than a typical binding site. (I assume it’s in A? The documentation doesn’t say.) The whole point of only featurizing the binding site is that a lot of atoms should be outside the box, so it doesn’t make sense to warn about them.
Great!
That sounds good, I’ll eliminate the warning messages.