Add Fuzz targets #262

pcwizz · 2025-02-26T15:38:23Z

Adds a selection of fuzz targets for jaq core. Lexer types are made public to enable direct fuzzing of the parser.

The type has been deemed to be sufficiently stable. Exposing the public interface will allow for more direct testing.

01mf02 · 2025-03-04T09:15:01Z

Thanks a lot for your work, @pcwizz!

01mf02 · 2025-03-04T09:16:23Z

jaq-core/fuzz/fuzz_targets/load_and_compile.rs

+use libfuzzer_sys::fuzz_target;
+
+fuzz_target!(|code: &str| {
+    if code.contains("\"") || code.contains("main") {


What is the purpose of this, especially the "main" part?

There are some shallow crashes on intputs with multiple mains for instance. This bit is just to filter the inputs so I could look for deeper crashes.

Thanks for the explanation. However, in jq, having multiple mains is perfectly valid: For example def main: 1; def main: 2; main is OK and returns 2. Also, I find regrettable that filtering out any quotes will not fuzz any of the string parsing.

Suppose that we would remove the if code.contains ... line. How can we evaluate the effectiveness of that change? In other words, how can we know whether this line helps or prevents looking for deeper crashes? Do you measure some kind of code coverage?

You can generate coverage reports: https://rust-fuzz.github.io/book/cargo-fuzz/coverage.html#generate-code-coverage-data

This should tell you which code paths are being hit.

I was quite tempted to remove those filters myself before committing, but decided instead to commit it as used. My general rule of thumb is that if I get in the cycle of hitting the same crash a couple of times then I tend to adjust the target to avoid it.

01mf02 · 2025-03-04T11:25:59Z

I have now implemented Arbitrary manually for token types. This should prevent the generation of many tokens that the lexer would never output, and which the parser therefore is not supposed handle. I will also document these invariants in the type definitions of Token and StrPart.

I noticed that the fuzzer now prints much less output. I would be very grateful if you check whether this is OK or whether I screwed up the Arbitrary implementation.

Also, I had a bit of a problem to generate a non-empty &str (in Token::arbitrary). I now just fail when a generated &str is empty, but this is a bit suboptimal, because it wastes computation power. If you have an idea how to do this better, I'm all ears!

pcwizz · 2025-03-04T11:33:39Z

I have now implemented Arbitrary manually for token types. This should prevent the generation of many tokens that the lexer would never output, and which the parser therefore is not supposed handle. I will also document these invariants in the type definitions of Token and StrPart.

I noticed that the fuzzer now prints much less output. I would be very grateful if you check whether this is OK or whether I screwed up the Arbitrary implementation.

Also, I had a bit of a problem to generate a non-empty &str (in Token::arbitrary). I now just fail when a generated &str is empty, but this is a bit suboptimal, because it wastes computation power. If you have an idea how to do this better, I'm all ears!

Sure I will take a look.

01mf02 · 2025-03-04T15:14:01Z

Inspired by your comment:

You can add your own inputs here (e.g. valid jaq programs) as a starting corpus to bootstrap your fuzzing efforts deeper in jaq's core.

I just wrote a script that makes a nice initial fuzzing corpus.

pcwizz · 2025-03-06T11:59:05Z

jaq-core/src/load/arbitrary.rs

I think as long as the Parser's interface documentation states that illegal tokens will result in panics then it is reasonable to narrow the fuzzing inputs to what should be valid tokens.

Implementing arbitrary directly in the module is certainly a lot less boiler plate. If you are happy with including the arbitrary dependency then this is a nice thing to do.

pcwizz · 2025-03-06T12:00:53Z

jaq-core/src/load/mod.rs

@@ -1,5 +1,7 @@
 //! Combined file loading, lexing, and parsing for multiple modules.

+#[cfg(feature = "arbitrary")]


Aha nice that will also solve the dependency question.

pcwizz · 2025-03-06T12:18:59Z

I have now implemented Arbitrary manually for token types. This should prevent the generation of many tokens that the lexer would never output, and which the parser therefore is not supposed handle. I will also document these invariants in the type definitions of Token and StrPart.

I noticed that the fuzzer now prints much less output. I would be very grateful if you check whether this is OK or whether I screwed up the Arbitrary implementation.

Also, I had a bit of a problem to generate a non-empty &str (in Token::arbitrary). I now just fail when a generated &str is empty, but this is a bit suboptimal, because it wastes computation power. If you have an idea how to do this better, I'm all ears!

If you can discard the input as quickly as possible then the fuzzing engine will decide it is uninteresting relatively quickly. To some extend wasting some time generating invalid inputs is always going to be part of the game. Having a good starting corpus will save you some of this waste on startup. The most important thing is that it is reproducible.

pcwizz added 5 commits February 26, 2025 16:13

Create initial fuzz targets

1695acb

Make lex::Tok globally public

81cac40

The type has been deemed to be sufficiently stable. Exposing the public interface will allow for more direct testing.

Add a fuzz target to harness the parser directly

e7b3b2b

Improve punctuation in fuzzing README

adab1da

Add a harness for fuzzing def parsing

771ebcd

pcwizz mentioned this pull request Feb 27, 2025

Out of bound array index in parser #263

Closed

01mf02 added 3 commits March 4, 2025 09:08

Remove Cargo.lock from fuzzer to decrease duplication.

1b9e618

Ignore Cargo.lock.

b76b1cb

Simplify data fuzzer.

834e3ea

Simplify load_and_compile.

624753d

01mf02 reviewed Mar 4, 2025

View reviewed changes

01mf02 added 2 commits March 4, 2025 12:13

Generate arbitrary tokens.

8805aa1

Use new arbitrary token generation and use in fuzz targets.

5e89661

01mf02 added 3 commits March 4, 2025 12:27

Document.

40a1662

Trim token string.

1982d2b

Actually compile fuzz target.

ebc5b6d

Create initial corpus and review README.

a866862

01mf02 added 7 commits March 4, 2025 16:40

Formatting.

9707c7c

Document lexer type invariants.

1e67914

Document constraint for Tok::Var.

a012399

Match empty strings to words or numbers in arbitrary token generation.

fa21083

Remove superfluous comment.

ee0aee1

Debugging code.

c8a3821

Document arbitrary token generation.

97cdaf1

pcwizz commented Mar 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Fuzz targets #262

Add Fuzz targets #262

pcwizz commented Feb 26, 2025

01mf02 commented Mar 4, 2025

01mf02 Mar 4, 2025

pcwizz Mar 4, 2025

01mf02 Mar 5, 2025

pcwizz Mar 6, 2025

01mf02 commented Mar 4, 2025

pcwizz commented Mar 4, 2025

01mf02 commented Mar 4, 2025

pcwizz Mar 6, 2025

pcwizz Mar 6, 2025

pcwizz commented Mar 6, 2025

		@@ -1,5 +1,7 @@
		//! Combined file loading, lexing, and parsing for multiple modules.

		#[cfg(feature = "arbitrary")]

Add Fuzz targets #262

Are you sure you want to change the base?

Add Fuzz targets #262

Conversation

pcwizz commented Feb 26, 2025

01mf02 commented Mar 4, 2025

01mf02 Mar 4, 2025

Choose a reason for hiding this comment

pcwizz Mar 4, 2025

Choose a reason for hiding this comment

01mf02 Mar 5, 2025

Choose a reason for hiding this comment

pcwizz Mar 6, 2025

Choose a reason for hiding this comment

01mf02 commented Mar 4, 2025

pcwizz commented Mar 4, 2025

01mf02 commented Mar 4, 2025

pcwizz Mar 6, 2025

Choose a reason for hiding this comment

pcwizz Mar 6, 2025

Choose a reason for hiding this comment

pcwizz commented Mar 6, 2025