question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Interpreting malformed chemical formulas as substances

See original GitHub issue

Hi, thanks for the ChemPy package! As I was using Substance.from_formula, I noticed that one could easily make a mistake in the formula string and not be notified about it. For example, if I wanted to add methanol, if I don’t make any mistakes it works great:

>>> from chempy import Substance
>>> methanol = Substance.from_formula('CH3OH')
>>> methanol.name
'CH3OH'
>>> methanol.composition
{6: 1, 1: 4, 8: 1}

If I make a minor mistake, for example forgetting to capitalize the first H for hydrogen, ChemPy gives no warning and simply stops at the last valid element. So the formula string is interpreted as simply C for carbon, even though the name is the entire supplied formula string Ch3OH:

>>> c = Substance.from_formula('Ch3OH')
>>> c.name
'Ch3OH'
>>> c.composition
{6: 1}

Is there something like a strict flag (option) to throw a warning or even exception (error) if the entire formula string cannot be interpreted as a substance? If the entire formula string cannot be interpreted as a substance, should the substance’s name have only the part that was interpreted as a substance?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:32 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jeremyagraycommented, Jan 26, 2022

IUPAC provisional recommendation (see IR-2.2.1 at bottom of page 3) lists braces as a grouping alternative, to be used alternately with parentheses to help avoid confusion:

I assumed that was the case. But nomenclature is such a wide and specialized area, sometimes it’s hard to keep track of it all.

By the way, how would treating square brackets as complexes (rather than simply a grouping alternative) affect the resulting substance?

Hopefully, it wouldn’t. I was looking more to add information as opposed to treating them differently. Things that depend on elemental composition should never know the difference. But, if something is in square brackets, then code could (optionally) treat it as a coordination complex (as opposed to a polyatomic ion or other group) and do appropriate stuff with it like deducing oxidation states, naming, decomposing the group into ligands, etc. I think my goal at the time was to use the parser with some additional nomenclature code to be able to interconvert formulas and names.

1reaction
jeremyagraycommented, Jan 22, 2022

I think I actually have a version of the formula parser (or at least the grammar) in chempy that handles dot notation and brackets for complexes. As I recall it actually broke a fair amount of the current tests and time constraints stopped my inquiries in that direction.

While writing parsers for other situations I have used pyparsing’s parse exceptions to signal errors so it can be done (easily). I suppose the questions are what features are desirable in the formula parser (dot notation for hydrated crystals, square brackets for complexes, @ caged symbol, etc.), what behavior is desired for the strict parser flag, and whether these are separate issues.

If I can get some clarity on these issues, I will patch the relevant bits of my parser onto the current version and investigate further.

Read more comments on GitHub >

github_iconTop Results From Across the Web

5.3: Chemical Formulas - How to Represent Compounds
Learning Objectives. Determine the number of different atoms in a formula. Define chemical formula, molecular formula, and empirical formula.
Read more >
Composition of Substances and Solutions
Understanding the relationship between the masses of atoms and the chemical formulas of compounds allows us to quantitatively describe the composition of ...
Read more >
Chemical formulas: What do they mean?
All chemical formulas are not interpreted in the same way. But although there is not just one way to interpret chemical formulas, you...
Read more >
Endocrine-Disrupting Chemicals - NCBI - NIH
There is growing interest in the possible health threat posed by endocrine-disrupting chemicals (EDCs), which are substances in our environment, food, ...
Read more >
Interpreting Chemical Formulas - YouTube
Interpreting Chemical Formulas. 6K views 6 years ago. Hannah Nandor. Hannah Nandor. 2.72K subscribers. Subscribe. Like. I like this.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found