Pre-optimize TM grammars for performance and reduced bundle size #243405

slevithan · 2025-03-13T00:11:29Z

TextMate grammars used for syntax highlighting can individually contain thousands of regexes and are, collectively, quite large. Their regexes can be optimized via minification that also improves their performance. An existing library, oniguruma-parser's Optimizer module, is made specifically for this -- it minifies Oniguruma regexes (the regex flavor used by TM grammars) without any change to what they match, and it applies automatic performance improvements to some regexes.

oniguruma-parser's optimizer has been battle-tested by the popular Shiki library, which recently starting running all of its more than 220 included TM grammars through it (the underlying Oniguruma parser has been used by Shiki's JS engine for much longer). Shiki does so in tm-grammars, and tests that syntax highlighting results are identical for all grammars before and after optimization. The size reduction and performance improvements are significant for some grammars. E.g., it shaves more than 35,000 characters off of regexes in just the C++ grammar (which doesn't include any whitespace or comments), and it improves the C++ grammar's performance by making changes that significantly reduce the amount of backtracking needed by some very large, complex, and slow regexes (again, without any changes to what any of the regexes match). You can see how Shiki applies it to TM grammars here.

CCing @alexr00 since she's been handling TM grammar updates.

The text was updated successfully, but these errors were encountered:

vs-code-engineering bot assigned alexr00 Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-optimize TM grammars for performance and reduced bundle size #243405

Pre-optimize TM grammars for performance and reduced bundle size #243405

slevithan commented Mar 13, 2025 •

edited

Loading

Pre-optimize TM grammars for performance and reduced bundle size #243405

Pre-optimize TM grammars for performance and reduced bundle size #243405

Comments

slevithan commented Mar 13, 2025 • edited Loading

slevithan commented Mar 13, 2025 •

edited

Loading