question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different parents for same molecule with ring substituents

See original GitHub issue

Hi CSP Team (@greglandrum @eloyfelix @apbento),

first of all, a great thanks to your team for providing such a pipeline! I think this is a good step into the right direction to solve the mess in virtual libraries! I hope you will maintain and further develop this project. It is very useful!

I am not sure if I do something wrong, but for this little example with 3 same molecules, only two (the first and the third) get matched by the pipeline:

import chembl_structure_pipeline as csp
from rdkit.Chem import MolFromSmiles, MolToSmiles

def main():
    smis = [
            'c1c(C)c(C)ccc1',
            'c1cc(C)c(C)cc1',
            'c1ccc(C)c(C)c1']

    print('Input SMILES:')
    for smi in smis:
        print(smi)

    mols = [MolFromSmiles(smi) for smi in smis]
    std_mols = [csp.standardize_mol(mol) for mol in mols]
    par_mols = [csp.get_parent_mol(s_mol)[0] for s_mol in std_mols]
    par_smis = [MolToSmiles(p_mol) for p_mol in par_mols]

    print('Parent SMILES:')
    for p_smi in par_smis:
        print(p_smi)

if __name__ == '__main__':
    main()

Perhaps you can provide me some feedback to this example? I have several other examples that get not matched.

Best Conrad

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
greglandrumcommented, Feb 9, 2022

@conrad-stork : the problem here is that the molecules which come back from get_parent_mol() are not fully sanitized. If you sanitize the molecules before generating the SMILES things should work:

In [9]: for m in par_mols:
   ...:     Chem.SanitizeMol(m)
   ...:

In [10]: par_smis = [Chem.MolToSmiles(p_mol) for p_mol in par_mols]

In [11]: par_smis
Out[11]: ['Cc1ccccc1C', 'Cc1ccccc1C', 'Cc1ccccc1C']

@eloyfelix : we should think about adding a call to SanitizeMol() to the pipeline functions which return molecules.

1reaction
eloyfelixcommented, Feb 9, 2022

Hi all,

sorry for the silence on this repo. We’ve been latelly quite involved on infrastructure works at EBI and we are also quite busy on the data side of things. I just want to reassure you that this is and will still be a core piece (hence mantained) of the ChEMBL database.

Thank you @greglandrum for keeping an eye on this. Yes, I think that it makes sense and I think that we could also re-evaluate if using RDKit’s sanitization could also be an option for us.

There are few issues that we’d like to sort for ChEMBL31 (v30 to be released in ~1 month) and we’ll tackle all of them at once since testing changes on the standardiser on the whole ChEMBL database usually takes us a bit of time.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Molecule Gallery - Aromatic Rings - Angelo State University
Aromatic hydrocarbons are nonpolar, and are insoluble in water. However, when other atoms are substituted on the benzene ring, they may be very...
Read more >
4.1: Naming Cycloalkanes - Chemistry LibreTexts
There are two different cycloalkanes in this molecule. Because it contains more carbons, the cyclopentane ring will be named as the parent ......
Read more >
Nomenclature Examples - MSU chemistry
When two or more identical substituents are present in a molecule, ... there are three substituents on the six-membered ring and two are...
Read more >
Naming Compounds with Ring Branches (IUPAC Style)
Chiral vs Achiral Molecules - Chirality Carbon Centers, Stereoisomers, Enantiomers, & Meso Compounds. The Organic Chemistry Tutor.
Read more >
How to name organic compounds using the IUPAC rules
Other groups which are attached to the parent chain are called substituents. ... If the same substituent occurs more than once, the location...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found