-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flex scanner push-back overflow issue #3824
Fix flex scanner push-back overflow issue #3824
Conversation
8022df4
to
966e26d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I see in the issue that @ajor has a minimal repo. Do you mind adding a test that failed before and now passes with this change?
1d4bc53
to
311fbca
Compare
I added two test cases that would previously cause a "flex scanner push-back overflow" error.
This case easily exceeds the recursion limit due to the recursive macro M.
In this case, the source string initially fills the entire buffer. The recursive macro will fill the buffer again before reaching the recursion limit. The new code will detect this situation. |
@jordalgo I assume that we'll want to backport this to the 0.23.0 release? |
4013faa
to
2c44af9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the fix, but the biggest issue IMO is it reaches too deep into flex internals. I think we need a way to do this with public API.
src/driver.cpp
Outdated
@@ -7,8 +7,12 @@ | |||
|
|||
struct yy_buffer_state; | |||
|
|||
extern struct yy_buffer_state *yy_scan_string(const char *yy_str, | |||
yyscan_t yyscanner); | |||
#ifndef MIN_YY_BUF_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not some preprocessor definition exported by flex, right? IMO better to drop the #ifndef
- it's confusing b/c it has no users (overrides) in our codebase.
src/driver.cpp
Outdated
auto &source = Log::get().get_source(); | ||
yy_scan_string_with_size(source.c_str(), | ||
std::max(source.size(), | ||
static_cast<size_t>(MIN_YY_BUF_SIZE)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need this to be size_t
, maybe just declare it as static constexpr size_t
, then? Instead of a macro
src/lexer.l
Outdated
YY_BUFFER_STATE yy_scan_string_with_size(const char *yystr, size_t size, yyscan_t yyscanner) { | ||
YY_BUFFER_STATE b = yy_scan_string(yystr, yyscanner); | ||
|
||
// If we use yy_scan_string(), the buffer size is set to strlen(yystr) + 2. | ||
// So we need to adjust the buffer to the given size. | ||
b->yy_buf_pos = b->yy_ch_buf = (char *)yyrealloc((void *)b->yy_ch_buf, size + 2, yyscanner); | ||
b->yy_buf_size = size; | ||
|
||
return b; | ||
} | ||
|
||
bool yy_is_buffer_full(struct yyguts_t *yyg) { | ||
char *yy_cp = yyg->yy_c_buf_p; | ||
return (yy_cp < YY_CURRENT_BUFFER_LVALUE->yy_ch_buf + 2 && | ||
YY_CURRENT_BUFFER_LVALUE->yy_n_chars == YY_CURRENT_BUFFER_LVALUE->yy_buf_size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'll grant this looks like it works (for now at least), it seems like this reaches way into the internals of flex. I suspect this will cause maintenance issues sooner or later, as I don't see any of these fields/types reference in the flex API.
Please consult https://westes.github.io/flex/manual/Index-of-Variables.html#Index-of-Variables and only use things in public API (or prove that it's indeed intended to be public but has no listing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I may have focused too much on the flex-generated lex.yy.cc
file. So I reviewed the flex documentation and came up with another approach (2c44af9). This method is documented (YY_INPUT, overriding). If it is better, I can squash this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, the new approach is to use a custom function to override the behavior of YY_INPUT
, changing the input from a file to the source string we provide. In this case, the buffer size is the default YY_BUF_SIZE
, and each read will copy up to YY_READ_BUF_SIZE
bytes (half of the buffer size). This gives us enough space for macro expansion (larger than the macro recursion limit of 1000 bytes).
Sorry about the delay - this slipped off my radar. I will take a look this weekend. |
tests/parser.cpp
Outdated
BPFtrace bpftrace; | ||
|
||
// A recursive macro expansion | ||
bpftrace.macros_.emplace("M", "M+1"); | ||
test_parse_failure(bpftrace, | ||
"#define M M+1\n" | ||
"BEGIN { M; }", | ||
R"( | ||
stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1 | ||
BEGIN { M; } | ||
~~~~ | ||
stdin:2:9-343: ERROR: syntax error, unexpected end of file | ||
BEGIN { M; } | ||
~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think parser inserts this, right? So not necessary to do anything fancy.
BPFtrace bpftrace; | |
// A recursive macro expansion | |
bpftrace.macros_.emplace("M", "M+1"); | |
test_parse_failure(bpftrace, | |
"#define M M+1\n" | |
"BEGIN { M; }", | |
R"( | |
stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1 | |
BEGIN { M; } | |
~~~~ | |
stdin:2:9-343: ERROR: syntax error, unexpected end of file | |
BEGIN { M; } | |
~~~~ | |
test_parse_failure("#define M M+1\n" | |
"BEGIN { M; }", | |
R"( | |
stdin:2:9-343: ERROR: Macro recursion limit reached: M, M+1 | |
BEGIN { M; } | |
~~~~ | |
stdin:2:9-343: ERROR: syntax error, unexpected end of file | |
BEGIN { M; } | |
~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it doesn't 😂. Macro parsing is done by ClangParser
(#3824 (comment)). I have added a comment to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. How about writing a new helper, maybe test_macro_parse_failure(...)
? We recently overhauled the pass manager, so should be straightforward. Example of a helper I recently added:
Lines 11 to 27 in 882c183
void test(BPFtrace &bpftrace, | |
const std::string &input, | |
const std::string &output) | |
{ | |
ast::ASTContext ast("stdin", input); | |
auto ok = ast::PassManager() | |
.put(ast) | |
.put(bpftrace) | |
.add(CreateParsePass()) | |
.add(ast::CreateDeprecatedPass()) | |
.run(); | |
ASSERT_TRUE(bool(ok)); | |
std::stringstream out; | |
ast.diagnostics().emit(out); | |
EXPECT_THAT(out.str(), HasSubstr(output)); | |
} |
Instead of ast::CreateDeprecatedPass()
you'd have CreateClangPass()
Reason is it'd be good to move the tests towards stricter input -> output and avoid reaching into classes to mock.
e70e6f7
to
93f3c66
Compare
Minor nit on the unit tests but LGTM otherwise. |
If we use yy_scan_string to scan the source string, the buffer size will be set to the size of the string, rather than the default YY_BUF_SIZE. This can easily cause a "push-back overflow" issue when using the unput() function due to the smaller buffer size. Previously, we assumed the remaining buffer size was always YY_BUF_SIZE, but unparsed strings were still stored in the buffer. This also led to "push-back overflow" issues. We solve this issue by overriding YY_INPUT. In this case, the buffer size will be the default YY_BUF_SIZE (16384), and each read will only copy YY_READ_BUF_SIZE (8192) bytes of data to the buffer. This means we have 8192 bytes available for macro expansion, which is greater than the expansion limit of 1000 bytes. Signed-off-by: Yitang Yang <stringcatwok@gmail.com>
Signed-off-by: Yitang Yang <stringcatwok@gmail.com>
93f3c66
to
43143ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this!
Do we want to backport this to the 0.23.x branch? |
@viktormalik Good reminder. I'll put up a backport PR. |
@viktormalik These commits aren't cherry-picking cleanly onto the release branch. I'll poke at it a little longer but maybe we should just leave it out of the release branch (unless someone else has the time to do this safely/confidently). |
Fixes #3215
This issue was caused by two factors:
yy_create_buffer()
, the buffer created by callingyy_scan_string()
was only the size of the string, notYY_BUF_SIZE
, which makes it easy to encounter buffer overflow.YY_BUF_SIZE
was assumed to be the remaining buffer size, leading to calls tounput()
even when the buffer was full. (Fix lexer buffer size check #2313)I defined the
yy_scan_string_with_size()
function to create a buffer of a specified size (MAX_BUF_SIZE
). And defined theyy_is_buffer_full()
function to check if the buffer is full, preventing calls tounput()
when the buffer is already full.Checklist
man/adoc/bpftrace.adoc
CHANGELOG.md