Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flex scanner push-back overflow issue #3824

Merged
merged 2 commits into from
Mar 22, 2025

Conversation

wokron
Copy link
Contributor

@wokron wokron commented Feb 25, 2025

Fixes #3215

This issue was caused by two factors:

  1. Unlike yy_create_buffer(), the buffer created by calling yy_scan_string() was only the size of the string, not YY_BUF_SIZE, which makes it easy to encounter buffer overflow.
  2. During macro processing, YY_BUF_SIZE was assumed to be the remaining buffer size, leading to calls to unput() even when the buffer was full. (Fix lexer buffer size check #2313)

I defined the yy_scan_string_with_size() function to create a buffer of a specified size (MAX_BUF_SIZE). And defined the yy_is_buffer_full() function to check if the buffer is full, preventing calls to unput() when the buffer is already full.

Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

@wokron wokron force-pushed the fix-scanner-push-back-overflow branch from 8022df4 to 966e26d Compare February 25, 2025 02:56
Copy link
Contributor

@jordalgo jordalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I see in the issue that @ajor has a minimal repo. Do you mind adding a test that failed before and now passes with this change?

@wokron wokron force-pushed the fix-scanner-push-back-overflow branch from 1d4bc53 to 311fbca Compare February 26, 2025 13:40
@wokron
Copy link
Contributor Author

wokron commented Feb 26, 2025

I added two test cases that would previously cause a "flex scanner push-back overflow" error.

#define M M+1
BEGIN { M; }

This case easily exceeds the recursion limit due to the recursive macro M.

#define M M+1
BEGIN { M; }
// very very long code below

In this case, the source string initially fills the entire buffer. The recursive macro will fill the buffer again before reaching the recursion limit. The new code will detect this situation.

@viktormalik
Copy link
Contributor

@jordalgo I assume that we'll want to backport this to the 0.23.0 release?

@wokron wokron force-pushed the fix-scanner-push-back-overflow branch 2 times, most recently from 4013faa to 2c44af9 Compare February 28, 2025 09:29
jordalgo
jordalgo previously approved these changes Feb 28, 2025
Copy link
Contributor

@jordalgo jordalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution!

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the fix, but the biggest issue IMO is it reaches too deep into flex internals. I think we need a way to do this with public API.

src/driver.cpp Outdated
@@ -7,8 +7,12 @@

struct yy_buffer_state;

extern struct yy_buffer_state *yy_scan_string(const char *yy_str,
yyscan_t yyscanner);
#ifndef MIN_YY_BUF_SIZE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not some preprocessor definition exported by flex, right? IMO better to drop the #ifndef - it's confusing b/c it has no users (overrides) in our codebase.

src/driver.cpp Outdated
auto &source = Log::get().get_source();
yy_scan_string_with_size(source.c_str(),
std::max(source.size(),
static_cast<size_t>(MIN_YY_BUF_SIZE)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need this to be size_t, maybe just declare it as static constexpr size_t, then? Instead of a macro

src/lexer.l Outdated
Comment on lines 300 to 316
YY_BUFFER_STATE yy_scan_string_with_size(const char *yystr, size_t size, yyscan_t yyscanner) {
YY_BUFFER_STATE b = yy_scan_string(yystr, yyscanner);

// If we use yy_scan_string(), the buffer size is set to strlen(yystr) + 2.
// So we need to adjust the buffer to the given size.
b->yy_buf_pos = b->yy_ch_buf = (char *)yyrealloc((void *)b->yy_ch_buf, size + 2, yyscanner);
b->yy_buf_size = size;

return b;
}

bool yy_is_buffer_full(struct yyguts_t *yyg) {
char *yy_cp = yyg->yy_c_buf_p;
return (yy_cp < YY_CURRENT_BUFFER_LVALUE->yy_ch_buf + 2 &&
YY_CURRENT_BUFFER_LVALUE->yy_n_chars == YY_CURRENT_BUFFER_LVALUE->yy_buf_size);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'll grant this looks like it works (for now at least), it seems like this reaches way into the internals of flex. I suspect this will cause maintenance issues sooner or later, as I don't see any of these fields/types reference in the flex API.

Please consult https://westes.github.io/flex/manual/Index-of-Variables.html#Index-of-Variables and only use things in public API (or prove that it's indeed intended to be public but has no listing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I may have focused too much on the flex-generated lex.yy.cc file. So I reviewed the flex documentation and came up with another approach (2c44af9). This method is documented (YY_INPUT, overriding). If it is better, I can squash this commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the new approach is to use a custom function to override the behavior of YY_INPUT, changing the input from a file to the source string we provide. In this case, the buffer size is the default YY_BUF_SIZE, and each read will copy up to YY_READ_BUF_SIZE bytes (half of the buffer size). This gives us enough space for macro expansion (larger than the macro recursion limit of 1000 bytes).

@danobi
Copy link
Member

danobi commented Mar 13, 2025

Sorry about the delay - this slipped off my radar. I will take a look this weekend.

tests/parser.cpp Outdated
Comment on lines 2877 to 3041
BPFtrace bpftrace;

// A recursive macro expansion
bpftrace.macros_.emplace("M", "M+1");
test_parse_failure(bpftrace,
"#define M M+1\n"
"BEGIN { M; }",
R"(
stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1
BEGIN { M; }
~~~~
stdin:2:9-343: ERROR: syntax error, unexpected end of file
BEGIN { M; }
~~~~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think parser inserts this, right? So not necessary to do anything fancy.

Suggested change
BPFtrace bpftrace;
// A recursive macro expansion
bpftrace.macros_.emplace("M", "M+1");
test_parse_failure(bpftrace,
"#define M M+1\n"
"BEGIN { M; }",
R"(
stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1
BEGIN { M; }
~~~~
stdin:2:9-343: ERROR: syntax error, unexpected end of file
BEGIN { M; }
~~~~
test_parse_failure("#define M M+1\n"
"BEGIN { M; }",
R"(
stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1
BEGIN { M; }
~~~~
stdin:2:9-343: ERROR: syntax error, unexpected end of file
BEGIN { M; }
~~~~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't 😂. Macro parsing is done by ClangParser (#3824 (comment)). I have added a comment to clarify this.

Copy link
Member

@danobi danobi Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. How about writing a new helper, maybe test_macro_parse_failure(...)? We recently overhauled the pass manager, so should be straightforward. Example of a helper I recently added:

void test(BPFtrace &bpftrace,
const std::string &input,
const std::string &output)
{
ast::ASTContext ast("stdin", input);
auto ok = ast::PassManager()
.put(ast)
.put(bpftrace)
.add(CreateParsePass())
.add(ast::CreateDeprecatedPass())
.run();
ASSERT_TRUE(bool(ok));
std::stringstream out;
ast.diagnostics().emit(out);
EXPECT_THAT(out.str(), HasSubstr(output));
}

Instead of ast::CreateDeprecatedPass() you'd have CreateClangPass()

Reason is it'd be good to move the tests towards stricter input -> output and avoid reaching into classes to mock.

@wokron wokron force-pushed the fix-scanner-push-back-overflow branch from e70e6f7 to 93f3c66 Compare March 18, 2025 07:12
@danobi
Copy link
Member

danobi commented Mar 21, 2025

Minor nit on the unit tests but LGTM otherwise.

wokron added 2 commits March 22, 2025 14:37
If we use yy_scan_string to scan the source
string, the buffer size will be set to the size of
the string, rather than the default YY_BUF_SIZE.
This can easily cause a "push-back overflow" issue
when using the unput() function due to the smaller
buffer size.

Previously, we assumed the remaining buffer size
was always YY_BUF_SIZE, but unparsed strings were
still stored in the buffer. This also led to
"push-back overflow" issues.

We solve this issue by overriding YY_INPUT. In this
case, the buffer size will be the default
YY_BUF_SIZE (16384), and each read will only copy
YY_READ_BUF_SIZE (8192) bytes of data to the buffer.
This means we have 8192 bytes available for macro
expansion, which is greater than the expansion
limit of 1000 bytes.

Signed-off-by: Yitang Yang <stringcatwok@gmail.com>
Signed-off-by: Yitang Yang <stringcatwok@gmail.com>
@wokron wokron force-pushed the fix-scanner-push-back-overflow branch from 93f3c66 to 43143ca Compare March 22, 2025 06:45
Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this!

@danobi danobi merged commit 4c01264 into bpftrace:master Mar 22, 2025
19 checks passed
@viktormalik
Copy link
Contributor

Do we want to backport this to the 0.23.x branch?

@jordalgo
Copy link
Contributor

@viktormalik Good reminder. I'll put up a backport PR.

@jordalgo
Copy link
Contributor

@viktormalik These commits aren't cherry-picking cleanly onto the release branch. I'll poke at it a little longer but maybe we should just leave it out of the release branch (unless someone else has the time to do this safely/confidently).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flex scanner push-back overflow
4 participants