-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Fuzz targets #262
base: main
Are you sure you want to change the base?
Add Fuzz targets #262
Conversation
The type has been deemed to be sufficiently stable. Exposing the public interface will allow for more direct testing.
Thanks a lot for your work, @pcwizz! |
use libfuzzer_sys::fuzz_target; | ||
|
||
fuzz_target!(|code: &str| { | ||
if code.contains("\"") || code.contains("main") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this, especially the "main" part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some shallow crashes on intputs with multiple mains for instance. This bit is just to filter the inputs so I could look for deeper crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. However, in jq, having multiple mains is perfectly valid: For example def main: 1; def main: 2; main
is OK and returns 2
. Also, I find regrettable that filtering out any quotes will not fuzz any of the string parsing.
Suppose that we would remove the if code.contains ...
line. How can we evaluate the effectiveness of that change? In other words, how can we know whether this line helps or prevents looking for deeper crashes? Do you measure some kind of code coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can generate coverage reports: https://rust-fuzz.github.io/book/cargo-fuzz/coverage.html#generate-code-coverage-data
This should tell you which code paths are being hit.
I was quite tempted to remove those filters myself before committing, but decided instead to commit it as used. My general rule of thumb is that if I get in the cycle of hitting the same crash a couple of times then I tend to adjust the target to avoid it.
I have now implemented I noticed that the fuzzer now prints much less output. I would be very grateful if you check whether this is OK or whether I screwed up the Also, I had a bit of a problem to generate a non-empty |
Sure I will take a look. |
Inspired by your comment:
I just wrote a script that makes a nice initial fuzzing corpus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as long as the Parser's interface documentation states that illegal tokens will result in panics then it is reasonable to narrow the fuzzing inputs to what should be valid tokens.
Implementing arbitrary directly in the module is certainly a lot less boiler plate. If you are happy with including the arbitrary dependency then this is a nice thing to do.
@@ -1,5 +1,7 @@ | |||
//! Combined file loading, lexing, and parsing for multiple modules. | |||
|
|||
#[cfg(feature = "arbitrary")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha nice that will also solve the dependency question.
If you can discard the input as quickly as possible then the fuzzing engine will decide it is uninteresting relatively quickly. To some extend wasting some time generating invalid inputs is always going to be part of the game. Having a good starting corpus will save you some of this waste on startup. The most important thing is that it is reproducible. |
Adds a selection of fuzz targets for jaq core. Lexer types are made public to enable direct fuzzing of the parser.