Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-optimize TM grammars for performance and reduced bundle size #243405

Open
slevithan opened this issue Mar 13, 2025 · 0 comments
Open

Pre-optimize TM grammars for performance and reduced bundle size #243405

slevithan opened this issue Mar 13, 2025 · 0 comments
Assignees

Comments

@slevithan
Copy link

slevithan commented Mar 13, 2025

TextMate grammars used for syntax highlighting can individually contain thousands of regexes and are, collectively, quite large. Their regexes can be optimized via minification that also improves their performance. An existing library, oniguruma-parser's Optimizer module, is made specifically for this -- it minifies Oniguruma regexes (the regex flavor used by TM grammars) without any change to what they match, and it applies automatic performance improvements to some regexes.

oniguruma-parser's optimizer has been battle-tested by the popular Shiki library, which recently starting running all of its more than 220 included TM grammars through it (the underlying Oniguruma parser has been used by Shiki's JS engine for much longer). Shiki does so in tm-grammars, and tests that syntax highlighting results are identical for all grammars before and after optimization. The size reduction and performance improvements are significant for some grammars. E.g., it shaves more than 35,000 characters off of regexes in just the C++ grammar (which doesn't include any whitespace or comments), and it improves the C++ grammar's performance by making changes that significantly reduce the amount of backtracking needed by some very large, complex, and slow regexes (again, without any changes to what any of the regexes match). You can see how Shiki applies it to TM grammars here.

CCing @alexr00 since she's been handling TM grammar updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants