Feature: next generation mathJSON
See original GitHub issueIntroduction
mathJSON (MASTON) has been useful to represent the content of a mathfield as an Abstract Syntax Tree in a format that can be parsed and manipulated. For example, it’s used on mathlive.io to power a computation engine that is used to evaluate expressions and plot them.
However, it has some limitations:
- it is relatively verbose, even for simple expressions, and not as easy to parse as it could be
- it is too close to the Latex syntactic conventions to be completely generalizable.
For example, some constructs are represented in ways that make them specific to the typesetting of those operations. e.g. exponentiation (i.e. x^2) is represented with a
sup
property. - the syntax and semantic are closely bound, and the current implementation does not have mechanisms to customize either.
For example, it would be desirable to specify how arithmetic operations are performed (using native JavaScript numbers, using BigInt, using a third-party numerics library, etc…).
It would also be desirable to be able to specify the syntactic rules of the Latex that can be parsed in order to support custom conventions, for example on how to interpret fences (
]-5, +∞)
) or other syntactic constructs, including specialized operators, functions and fences.
As another example from #293 \frac{d}{dx} a + b
could be interpreted/parsed as:
- if
d
is a known variable: “((d / (d * x)) * a) + b” - or it’s “(a + b) derived for x”
- or it’s “(a derived for x) + b”
The ‘correct’ interpretation is entirely dependent of the context, and there is currently no way to control this.
Proposal
Therefore, we propose a new version of mathJSON that will feature the following:
- Clear separation between syntax and semantic. In particular the semantic will be completely independent from the Latex syntax. A “translator” from/to another syntax (MathML, ASCIIMath, etc…) could be provided.
- The syntax will be represented by a set of rules transforming a Latex stream into a mathJSON expression. These rules can be complemented or overridden. If no syntactic rules are provided, the result is a valid mathJSON expression representing a stream of Latex tokens (i.e. '[“latex”, “\frac”, “{”, “1”, “}”, “{”, “2”, “}” for
\frac{1}{2}
). An option to include parsing of Latex commands (but not their interpretation) would result in["latex", ["\\frac", "1", "2"]}
. A default rule would specify that\frac
should map to thedivide
function, in which case the output would be["divide", 1, 2]
- The semantic will be provided by a dictionary of symbols, specifying what the symbol represent (constant, variable, function) and with associated methods to evaluate it, etc…
- Default syntax and semantic will be provided for various domains (arithmetic, algebra, calculus, etc…). Only those dictionaries relevant to the application can be loaded.
Examples
Latex | mathJSON |
---|---|
\frac{a}{1+x} |
["divide", "a", ["add", 1, "x"]] |
e^{\imaginaryI \pi }+1=0 |
["eq", ["power", "e", ["add", ["multiply", "pi", "i"], 1]], 0] |
For comparison, that last expression was represented in the previous mathJSON version as:
{
"fn": "equal",
"arg": [
{
"fn": "add",
"arg": [
{
"sym": "e",
"sup": {
"fn": "multiply",
"arg": [{ "sym": "ⅈ" }, { "sym": "π" }]
}
},
{ "num": "1" }
]
},
{ "num": "0" }
]
}
Backward Compatibility
The new format is not backward compatible with the previous version of mathJSON. Although a “translator” between the formats could be written, we do not plan to provide one.
Related Issues
This feature will address the following related issues: #437, #396, #380, #379, #293.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:46 (31 by maintainers)
Top GitHub Comments
Progress update: an implementation is now available at https://github.com/cortex-js/math-json
The documentation of the API is lacking, but you can get an idea of how to use it by looking at
test/index.html
.Update:
The work is in progress, with the core functionality implemented.
This includes support for several “forms”, including a canonical form that transforms expressions so they are written as sum of products, with sorted argument (for commutative functions) and using a lexdeg sort order for polynomials.
What’s left to do:
Once this lands, it will be able to handle some pretty gnarly notations that were difficult/impossible to handle before, for example:
\sin^{-1}\prime x
->[(["derivative", 1, ["inverse-function", "sin"]], "x")]