Fix flex scanner push-back overflow issue #3824

wokron · 2025-02-25T02:23:29Z

This issue was caused by two factors:

Unlike yy_create_buffer(), the buffer created by calling yy_scan_string() was only the size of the string, not YY_BUF_SIZE, which makes it easy to encounter buffer overflow.
During macro processing, YY_BUF_SIZE was assumed to be the remaining buffer size, leading to calls to unput() even when the buffer was full. (Fix lexer buffer size check #2313)

I defined the yy_scan_string_with_size() function to create a buffer of a specified size (MAX_BUF_SIZE). And defined the yy_is_buffer_full() function to check if the buffer is full, preventing calls to unput() when the buffer is already full.

Checklist

Language changes are updated in man/adoc/bpftrace.adoc
User-visible and non-trivial changes updated in CHANGELOG.md
The new behaviour is covered by tests

jordalgo

Thanks for working on this! I see in the issue that @ajor has a minimal repo. Do you mind adding a test that failed before and now passes with this change?

src/driver.cpp

src/lexer.l

tests/parser.cpp

wokron · 2025-02-26T13:59:30Z

I added two test cases that would previously cause a "flex scanner push-back overflow" error.

#define M M+1
BEGIN { M; }

This case easily exceeds the recursion limit due to the recursive macro M.

#define M M+1
BEGIN { M; }
// very very long code below

In this case, the source string initially fills the entire buffer. The recursive macro will fill the buffer again before reaching the recursion limit. The new code will detect this situation.

viktormalik · 2025-02-27T08:52:13Z

@jordalgo I assume that we'll want to backport this to the 0.23.0 release?

tests/parser.cpp

jordalgo

LGTM. Thanks for the contribution!

danobi

Appreciate the fix, but the biggest issue IMO is it reaches too deep into flex internals. I think we need a way to do this with public API.

danobi · 2025-03-03T17:19:26Z

src/driver.cpp

@@ -7,8 +7,12 @@

 struct yy_buffer_state;

-extern struct yy_buffer_state *yy_scan_string(const char *yy_str,
-                                              yyscan_t yyscanner);
+#ifndef MIN_YY_BUF_SIZE


This is not some preprocessor definition exported by flex, right? IMO better to drop the #ifndef - it's confusing b/c it has no users (overrides) in our codebase.

danobi · 2025-03-03T17:26:41Z

src/driver.cpp

+  auto &source = Log::get().get_source();
+  yy_scan_string_with_size(source.c_str(),
+                           std::max(source.size(),
+                                    static_cast<size_t>(MIN_YY_BUF_SIZE)),


If you need this to be size_t, maybe just declare it as static constexpr size_t, then? Instead of a macro

danobi · 2025-03-03T17:35:26Z

src/lexer.l

+YY_BUFFER_STATE yy_scan_string_with_size(const char *yystr, size_t size, yyscan_t yyscanner) {
+  YY_BUFFER_STATE b = yy_scan_string(yystr, yyscanner);
+
+  // If we use yy_scan_string(), the buffer size is set to strlen(yystr) + 2.
+  // So we need to adjust the buffer to the given size.
+  b->yy_buf_pos = b->yy_ch_buf = (char *)yyrealloc((void *)b->yy_ch_buf, size + 2, yyscanner);
+  b->yy_buf_size = size;
+
+  return b;
+}
+
+bool yy_is_buffer_full(struct yyguts_t *yyg) {
+  char *yy_cp = yyg->yy_c_buf_p;
+  return (yy_cp < YY_CURRENT_BUFFER_LVALUE->yy_ch_buf + 2 &&
+          YY_CURRENT_BUFFER_LVALUE->yy_n_chars == YY_CURRENT_BUFFER_LVALUE->yy_buf_size);
+}


While I'll grant this looks like it works (for now at least), it seems like this reaches way into the internals of flex. I suspect this will cause maintenance issues sooner or later, as I don't see any of these fields/types reference in the flex API.

Please consult https://westes.github.io/flex/manual/Index-of-Variables.html#Index-of-Variables and only use things in public API (or prove that it's indeed intended to be public but has no listing).

You are right, I may have focused too much on the flex-generated lex.yy.cc file. So I reviewed the flex documentation and came up with another approach (2c44af9). This method is documented (YY_INPUT, overriding). If it is better, I can squash this commit.

Basically, the new approach is to use a custom function to override the behavior of YY_INPUT, changing the input from a file to the source string we provide. In this case, the buffer size is the default YY_BUF_SIZE, and each read will copy up to YY_READ_BUF_SIZE bytes (half of the buffer size). This gives us enough space for macro expansion (larger than the macro recursion limit of 1000 bytes).

danobi · 2025-03-13T20:38:50Z

Sorry about the delay - this slipped off my radar. I will take a look this weekend.

src/lexer.l

danobi · 2025-03-17T00:21:44Z

tests/parser.cpp

+  BPFtrace bpftrace;
+
+  // A recursive macro expansion
+  bpftrace.macros_.emplace("M", "M+1");
+  test_parse_failure(bpftrace,
+                     "#define M M+1\n"
+                     "BEGIN { M; }",
+                     R"(
+stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1
+BEGIN { M; }
+        ~~~~
+stdin:2:9-343: ERROR: syntax error, unexpected end of file
+BEGIN { M; }
+        ~~~~


I think parser inserts this, right? So not necessary to do anything fancy.

Suggested change

BPFtrace bpftrace;

// A recursive macro expansion

bpftrace.macros_.emplace("M", "M+1");

test_parse_failure(bpftrace,

"#define M M+1\n"

"BEGIN { M; }",

R"(

stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1

BEGIN { M; }

~~~~

stdin:2:9-343: ERROR: syntax error, unexpected end of file

BEGIN { M; }

~~~~

test_parse_failure("#define M M+1\n"

"BEGIN { M; }",

R"(

stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1

BEGIN { M; }

~~~~

stdin:2:9-343: ERROR: syntax error, unexpected end of file

BEGIN { M; }

~~~~

Actually, it doesn't 😂. Macro parsing is done by ClangParser (#3824 (comment)). I have added a comment to clarify this.

Ah, got it. How about writing a new helper, maybe test_macro_parse_failure(...)? We recently overhauled the pass manager, so should be straightforward. Example of a helper I recently added:

bpftrace/tests/deprecated.cpp

Lines 11 to 27 in 882c183

void test(BPFtrace &bpftrace,

const std::string &input,

const std::string &output)

{

ast::ASTContext ast("stdin", input);

auto ok = ast::PassManager()

.put(ast)

.put(bpftrace)

.add(CreateParsePass())

.add(ast::CreateDeprecatedPass())

.run();

ASSERT_TRUE(bool(ok));

std::stringstream out;

ast.diagnostics().emit(out);

EXPECT_THAT(out.str(), HasSubstr(output));

}

Instead of ast::CreateDeprecatedPass() you'd have CreateClangPass()

Reason is it'd be good to move the tests towards stricter input -> output and avoid reaching into classes to mock.

src/lexer.l

danobi · 2025-03-21T22:17:26Z

Minor nit on the unit tests but LGTM otherwise.

If we use yy_scan_string to scan the source string, the buffer size will be set to the size of the string, rather than the default YY_BUF_SIZE. This can easily cause a "push-back overflow" issue when using the unput() function due to the smaller buffer size. Previously, we assumed the remaining buffer size was always YY_BUF_SIZE, but unparsed strings were still stored in the buffer. This also led to "push-back overflow" issues. We solve this issue by overriding YY_INPUT. In this case, the buffer size will be the default YY_BUF_SIZE (16384), and each read will only copy YY_READ_BUF_SIZE (8192) bytes of data to the buffer. This means we have 8192 bytes available for macro expansion, which is greater than the expansion limit of 1000 bytes. Signed-off-by: Yitang Yang <stringcatwok@gmail.com>

Signed-off-by: Yitang Yang <stringcatwok@gmail.com>

danobi

thanks for working on this!

viktormalik · 2025-03-24T14:24:12Z

Do we want to backport this to the 0.23.x branch?

jordalgo · 2025-03-24T15:57:05Z

@viktormalik Good reminder. I'll put up a backport PR.

jordalgo · 2025-03-24T19:15:35Z

@viktormalik These commits aren't cherry-picking cleanly onto the release branch. I'll poke at it a little longer but maybe we should just leave it out of the release branch (unless someone else has the time to do this safely/confidently).

wokron requested review from ajor, viktormalik, danobi, fbs and jordalgo as code owners February 25, 2025 02:23

wokron force-pushed the fix-scanner-push-back-overflow branch from 8022df4 to 966e26d Compare February 25, 2025 02:56

jordalgo requested changes Feb 25, 2025

View reviewed changes

src/driver.cpp Outdated Show resolved Hide resolved

src/lexer.l Outdated Show resolved Hide resolved

wokron force-pushed the fix-scanner-push-back-overflow branch from 1d4bc53 to 311fbca Compare February 26, 2025 13:40

github-advanced-security bot found potential problems Feb 26, 2025

View reviewed changes

tests/parser.cpp Fixed Show fixed Hide fixed

jordalgo reviewed Feb 27, 2025

View reviewed changes

tests/parser.cpp Outdated Show resolved Hide resolved

wokron force-pushed the fix-scanner-push-back-overflow branch 2 times, most recently from 4013faa to 2c44af9 Compare February 28, 2025 09:29

jordalgo previously approved these changes Feb 28, 2025

View reviewed changes

danobi requested changes Mar 3, 2025

View reviewed changes

danobi reviewed Mar 17, 2025

View reviewed changes

wokron force-pushed the fix-scanner-push-back-overflow branch from e70e6f7 to 93f3c66 Compare March 18, 2025 07:12

wokron added 2 commits March 22, 2025 14:37

Move global variables in lexer.l into Driver class

43143ca

Signed-off-by: Yitang Yang <stringcatwok@gmail.com>

wokron force-pushed the fix-scanner-push-back-overflow branch from 93f3c66 to 43143ca Compare March 22, 2025 06:45

wokron dismissed jordalgo’s stale review via 43143ca March 22, 2025 22:17

danobi approved these changes Mar 22, 2025

View reviewed changes

danobi merged commit 4c01264 into bpftrace:master Mar 22, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flex scanner push-back overflow issue #3824

Fix flex scanner push-back overflow issue #3824

wokron commented Feb 25, 2025 •

edited

Loading

jordalgo left a comment

wokron commented Feb 26, 2025 •

edited

Loading

viktormalik commented Feb 27, 2025

jordalgo left a comment

danobi left a comment

danobi Mar 3, 2025

danobi Mar 3, 2025

danobi Mar 3, 2025

wokron Mar 4, 2025

wokron Mar 4, 2025

danobi commented Mar 13, 2025

danobi Mar 17, 2025

wokron Mar 18, 2025

danobi Mar 21, 2025 •

edited

Loading

danobi commented Mar 21, 2025

danobi left a comment

viktormalik commented Mar 24, 2025

jordalgo commented Mar 24, 2025

jordalgo commented Mar 24, 2025

	void test(BPFtrace &bpftrace,
	const std::string &input,
	const std::string &output)
	{
	ast::ASTContext ast("stdin", input);
	auto ok = ast::PassManager()
	.put(ast)
	.put(bpftrace)
	.add(CreateParsePass())
	.add(ast::CreateDeprecatedPass())
	.run();
	ASSERT_TRUE(bool(ok));

	std::stringstream out;
	ast.diagnostics().emit(out);
	EXPECT_THAT(out.str(), HasSubstr(output));
	}

Fix flex scanner push-back overflow issue #3824

Fix flex scanner push-back overflow issue #3824

Conversation

wokron commented Feb 25, 2025 • edited Loading

Checklist

jordalgo left a comment

Choose a reason for hiding this comment

wokron commented Feb 26, 2025 • edited Loading

viktormalik commented Feb 27, 2025

jordalgo left a comment

Choose a reason for hiding this comment

danobi left a comment

Choose a reason for hiding this comment

danobi Mar 3, 2025

Choose a reason for hiding this comment

danobi Mar 3, 2025

Choose a reason for hiding this comment

danobi Mar 3, 2025

Choose a reason for hiding this comment

wokron Mar 4, 2025

Choose a reason for hiding this comment

wokron Mar 4, 2025

Choose a reason for hiding this comment

danobi commented Mar 13, 2025

danobi Mar 17, 2025

Choose a reason for hiding this comment

wokron Mar 18, 2025

Choose a reason for hiding this comment

danobi Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

danobi commented Mar 21, 2025

danobi left a comment

Choose a reason for hiding this comment

viktormalik commented Mar 24, 2025

jordalgo commented Mar 24, 2025

jordalgo commented Mar 24, 2025

wokron commented Feb 25, 2025 •

edited

Loading

wokron commented Feb 26, 2025 •

edited

Loading

danobi Mar 21, 2025 •

edited

Loading