Different parents for same molecule with ring substituents
See original GitHub issueHi CSP Team (@greglandrum @eloyfelix @apbento),
first of all, a great thanks to your team for providing such a pipeline! I think this is a good step into the right direction to solve the mess in virtual libraries! I hope you will maintain and further develop this project. It is very useful!
I am not sure if I do something wrong, but for this little example with 3 same molecules, only two (the first and the third) get matched by the pipeline:
import chembl_structure_pipeline as csp
from rdkit.Chem import MolFromSmiles, MolToSmiles
def main():
smis = [
'c1c(C)c(C)ccc1',
'c1cc(C)c(C)cc1',
'c1ccc(C)c(C)c1']
print('Input SMILES:')
for smi in smis:
print(smi)
mols = [MolFromSmiles(smi) for smi in smis]
std_mols = [csp.standardize_mol(mol) for mol in mols]
par_mols = [csp.get_parent_mol(s_mol)[0] for s_mol in std_mols]
par_smis = [MolToSmiles(p_mol) for p_mol in par_mols]
print('Parent SMILES:')
for p_smi in par_smis:
print(p_smi)
if __name__ == '__main__':
main()
Perhaps you can provide me some feedback to this example? I have several other examples that get not matched.
Best Conrad
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Molecule Gallery - Aromatic Rings - Angelo State University
Aromatic hydrocarbons are nonpolar, and are insoluble in water. However, when other atoms are substituted on the benzene ring, they may be very...
Read more >4.1: Naming Cycloalkanes - Chemistry LibreTexts
There are two different cycloalkanes in this molecule. Because it contains more carbons, the cyclopentane ring will be named as the parent ......
Read more >Nomenclature Examples - MSU chemistry
When two or more identical substituents are present in a molecule, ... there are three substituents on the six-membered ring and two are...
Read more >Naming Compounds with Ring Branches (IUPAC Style)
Chiral vs Achiral Molecules - Chirality Carbon Centers, Stereoisomers, Enantiomers, & Meso Compounds. The Organic Chemistry Tutor.
Read more >How to name organic compounds using the IUPAC rules
Other groups which are attached to the parent chain are called substituents. ... If the same substituent occurs more than once, the location...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@conrad-stork : the problem here is that the molecules which come back from
get_parent_mol()
are not fully sanitized. If you sanitize the molecules before generating the SMILES things should work:@eloyfelix : we should think about adding a call to
SanitizeMol()
to the pipeline functions which return molecules.Hi all,
sorry for the silence on this repo. We’ve been latelly quite involved on infrastructure works at EBI and we are also quite busy on the data side of things. I just want to reassure you that this is and will still be a core piece (hence mantained) of the ChEMBL database.
Thank you @greglandrum for keeping an eye on this. Yes, I think that it makes sense and I think that we could also re-evaluate if using RDKit’s sanitization could also be an option for us.
There are few issues that we’d like to sort for ChEMBL31 (v30 to be released in ~1 month) and we’ll tackle all of them at once since testing changes on the standardiser on the whole ChEMBL database usually takes us a bit of time.