MDX performance
See original GitHub issue☂️ This umbrella issue is for tracking work related to improving performance to MDX.
I’ve been working with @pvdz on MDX performance. We’ve noted a few aspects that add unnecessary work which we should be able to reduce, especially in v2.
Numerous babel parse and transformation steps
Firstly, we have multiple babel parse steps throughout the MDX transpilation pipeline.
Imports and exports
- Partitioning imports and exports
- Finding the default export
Peter has done some work here in gatsby-plugin-mdx
that we can potentially adapt gatsbyjs/gatsby#25437 for usage in core.
Shortcode generation
We use babel to figure out what imports and exports exist, and then use that to instantiate variables coming from MDXProvider
with makeShortcode
. Also related to gatsbyjs/gatsby#25437
mdxType
This is used by the runtime (react/preact/vue) to determine which component to render. This is something we can do from the MDXAST in v2 since the JSX structure is represented.
Returning a compiled string that inevitably needs to be transpiled
Secondly, to these parse steps we also return a JSX string. In nearly all cases this JSX string is then transpiled to JS and mdx
pragma function calls. This was originally an intentional output because we wanted to make MDX more palatable and familiar. However, it might make sense to serialize directly to function calls and JS.
This would remove a babel step users need (unless they’re using optional syntax or need browser polyfills which is still achievable in user land).
You all are welcome to bring up other areas of the codebase we can make more performant or other ideas as well! In fact, we’d love your thoughts.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:13
- Comments:8 (5 by maintainers)
Top GitHub Comments
Yeah so if we keep certain artificials limitations (which already apply today) in place then we can distill the imports/exports from the mdast without the need of Babel. That’s been the source of some significant perf improvements at startup time (like https://github.com/gatsbyjs/gatsby/pull/25757).
The reasoning here is that the
import
andexport
syntax is very strict and if we disallow comments in between then a regular expression or simple string manipulation can quickly get us the answers we need (-> the symbols being imported and exported).For imports the only limitation might be not to allow comments inside an import and only at the end of a line. These are the forms of import:
import ID from 'y'
import * as ID from 'y'
import {ID} from 'y'
import {ID as ID2} from 'y'
import ID, {ID2} from 'y'
import * as ID, {ID2} from 'y'
The
{}
pattern can repeat and for each caseas
is optional. For the fix in Gatsby, to get the imported idents, I took these imports and used a regex to remove all parts that were not interested in, leaving us with comma separated sets ofID
orID as ID2
. You can easily take the last ID and that’ll be the one you want.Leaning on the fact that imports are constants (and valid input), no further need to dedupe them is required.
So to make life easy, the only syntactical restriction, beyond non-standard syntax of course, is to disallow comments inside the import declaration. And maybe disallow the variant where
from
is omitted (where you import a module for side effects).For exports it’s a little trickier, mainly because you can export arbitrary expressions and because of defaults in destructuring. However, it turns out that exports are currently limited to a single line. That’s great because that makes them easy to slice out.
Further more, if you apply the same comment restriction to exports and disallow destructuring defaults, you can “cheat” your way out of not requiring any JS parser and still distill all the exported symbols, as well as finding the default export. You can even support the newer
export <pattern> from 'file'
, which I believe is currently not supported.export default function abc(){}
export const foo = bar
export class Boo {}
export { ding, dong as dang }
export let [a, b] = obj
export let [a = 1, b = 2] = obj
<-- this is the one to disallowIn all the above cases, except last, you can parse up to the first
=
character (forvar
,let
, andconst
exports) to get all the exported symbol names safely. The syntax forfunction
andclass
is restricted enough by itself. The re-export syntax can be done similarly as the imports above. All in all, it’ll be much faster than the overhead of a full JS parse.For JSX serialization you can use a faster parser/printer than Babel. I know Acorn can do it. There’s also Sucrase, and a few others.
My suggestion to John was to default to anything fast and to expose an option for the user to do it for you instead, since mdx doesn’t reaaally care how the jsx gets compiled to JS. Or wouldn’t need to, as far as I understand. So a user could give mdx a callback like
function callback(jsxString) { return parser(jsxString).serialize(pragma); }
and mdx would just run it instead.If I’m not mistaken, this way MDX wouldn’t need to run a JS parser at all.
One other potential trick is to concat the expressions with a searchable separator (an identifier of sorts or the
debugger
statement) and concat the jsx expressions together. Feed them to a parser, print them again, split on the debugger statement (or whatever you pick). That may already be what’s happening now, I’m not sure…?Oh and a third option is to allow the user to pass through a Babel config / options for the whole build step. That way if Babel is ran inside MDX anyways, it can just as well also do all the other transformations, like polyfill transforms etc, so that the main pipeline doesn’t need to process it again. Potentially. But that might be a pretty big pandora’s box of complexity to open up.
Probably also faster if we only process jsx there, and nothing else, leaving that up to folks. But indeed, wondering on the benchmarks of 100 expressions vs 1 file