Skip to content

Emit warning with attached Diagnostic when doing = NULL #14434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #14429
eliaperantoni opened this issue Feb 3, 2025 · 10 comments · May be fixed by #15696
Open
Tracked by #14429

Emit warning with attached Diagnostic when doing = NULL #14434

eliaperantoni opened this issue Feb 3, 2025 · 10 comments · May be fixed by #15696
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@eliaperantoni
Copy link
Contributor

Is your feature request related to a problem or challenge?

For a query like:

SELECT * FROM users WHERE password = NULL

DataFusion does not emit any error. Of course this a valid query, but it's very likely that the user intended to do password IS NULL instead.

We want to provide a rich message that references and highlights locations in the original SQL query, and contextualises and helps the user understand the pitfall. In the end, it would be possible to display a warning in a fashion akin to what was enabled by #13664 for some (fatal) errors:

See #14429 for more information.

Describe the solution you'd like

Attach a well crafted Diagnostic to the DataFusionError, building on top of the foundations laid in #13664. See #14429 for more information.

There would probably need to be some plumbing required to make DataFusion collect warnings without halting the overall execution. In this scenario, we want to use Diagnostic::new_warning.

Describe alternatives you've considered

No response

Additional context

No response

@eliaperantoni eliaperantoni added the enhancement New feature or request label Feb 3, 2025
@alamb alamb added the good first issue Good for newcomers label Feb 4, 2025
@alamb
Copy link
Contributor

alamb commented Feb 4, 2025

I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cases/diagnostic.rs are well structured for extension.

@ugoa
Copy link
Contributor

ugoa commented Feb 9, 2025

take

@eliaperantoni
Copy link
Contributor Author

Hey @ugoa how is it going with this ticket :) Can I help with anything?

@ugoa
Copy link
Contributor

ugoa commented Feb 15, 2025

Yeah I am working on it this weekend

@changsun20
Copy link
Contributor

Hi @ugoa, I noticed that this ticket was assigned to you a couple weeks ago. May I ask if any progress has been made? If you are willing to cooperate, I would be happy to submit a PR to help resolve this issue. Please let me know if this works for you. Appreciate your efforts!

@changsun20
Copy link
Contributor

changsun20 commented Mar 15, 2025

Hi @eliaperantoni,

Thank you raising this issue. As I explore potential implementations, I’d appreciate your insights on a few key points:

1. Warning Scope for = NULL Patterns
Could you help confirm the appropriate scope for detecting = NULL comparisons?

  • Option 1: Emit warnings universally whenever = NULL is parsed.
    • Potential Risk: Trigger warnings whenever "= NULL" is parsed. However, expressions like UPDATE users SET password = NULL would be legitimate usage.
    • Note: Since Dml(Update) isn't currently supported in DataFusion, this might not be an immediate issue but could become one.
  • Option 2: Restrict warnings to predicate contexts (WHERE/JOIN/HAVING/CASE WHEN clauses).
    • Tradeoff: This would be more precise but adds complexity.

2. Non-Fatal Warning Plumbing
You mentioned the need to implement plumbing and collect warnings via Diagnostic::new_warning. Could you please elaborate more on this? Can you briefly outline (or just give me a hint) how to propagate warnings without interrupting the execution flow? I recognize this may require deeper code context and may be too general to answer—feel free to defer discussion until I share a draft implementation!

3. Span Highlighting Strategy
For the query SELECT * FROM users WHERE password = NULL, the parsed structure shows:

BinaryOp {
    left: Identifier(
        Ident {
            value: "password",
            quote_style: None,
            span: Span(Location(1,27)..Location(1,35)),
        },
    ),
    op: Eq,
    right: Value(
        Null,
    ),
},

I have come up with some possible approaches:

  • Option 1: Add Span to op: Eq and/or right: Value(Null) - most precise but risks breaking changes elsewhere
  • Option 2: Use the password identifier’s span - simplest approach, but highlights the column name rather than the actual issue ("= NULL").
  • Option 3: Use the identifier's Span as a starting point and estimate the entire expression's span - avoids structural changes but may be inaccurate with variable whitespace

What would be the best approach in your opinion?

Thank you for your guidance!

@eliaperantoni
Copy link
Contributor Author

Thank you @changsun20 for the detailed analysis 🙏

1. Warning Scope for = NULL Patterns

I would say that, if's not too complicated, we should go for the semantic approach. i.e. only raise a warning when = NULL is being used as a condition.

2. Non-Fatal Warning Plumbing

This is a complicated one! 😃 You hit the nail on the head: the complicated part is bubbling up a warning without interrupting everything. I haven't thought too much about how to solve it.

Perhaps one way would be to make a type DatafusionResult<R> = Result<(R, Vec<DatafusionWarning>), DatafusionError> so that a fatal error can still be bubbled up with ?, and then a WarningAccumulator could help with the aggregation of warnings that arise in a function body. i.e.

fn parse_query(sql: &str) -> DatafusionResult<LogicalPlan> {
    let mut warnings = WarningAccumulator::new();
	
	// WarningAccumulator::extend(&mut self) takes a DatafusionResult<R> and
	// returns Result<R>, while storing any warnings in itself.
	let ctes = warnings.extend(parse_ctes(sql))?;
	let mut logical_plan = warnings.extend(parse_select(sql, ctes))?;
	warnings.extend(parse_where(&mut logical_plan, sql))?;

	// WarningAccumulator::into returns Vec<DatafusionWarning>
	DatafusionResult::ok_with_warnings(logical_plan, warnings.into())
}

But I believe that's a bit invasive because it requires replacing Result with DatafusionResult all the way up to the outermost function call. It also pollutes the code with a lot of warnings.extend calls. Though perhaps this would be the most robust solution in the long term.

Alternatively, to start with we could add a field to SqlToRel where warnings are collected (i.e. just a warnings: Vec<Diagnostic> would work imo, no need for a new type), and then the user can get them out if they so wish. The only problem I see with this is that most (all?) methods of SqlToRel take an immutable reference. So you'd either have to use interior mutability, or change most receivers to &mut self. The latter is definitely a breaking change so I'd not encourage going that route. I think interior mutability with RefCell is fine.

3. Span Highlighting Strategy

Add Span to op: Eq and/or right: Value(Null) - most precise but risks breaking changes elsewhere

This means changing the sqlparser dependency, that's the only component that can give you spans. Actually, someone already added spans to Value (apache/datafusion-sqlparser-rs#1738). Perhaps you could bump Datafusion's depedency on sqlparser to get that?

If that doesn't work, I would suggest perhaps going the easy route of highlighting the expression and saying "this expression is being compared with NULL and that's always falsey, perhaps you meant IS NULL" or something like that. It's not the best but it's the most we can do without more information from sqlparser :) I think trying to find the IS NULL by means of character operations is not worth the complexity and risks missing some edges cases.

@changsun20
Copy link
Contributor

Hi @eliaperantoni,

Thank you for the detailed guidance! Here's my understanding of the next steps:

  1. Warning Scope
    I'll implement the semantic approach to only raise warnings when = NULL is used as a condition, which makes the most sense functionally.

  2. Warning Plumbing
    Both approaches you outlined are helpful. I'll start with the interior mutability approach using warnings: RefCell<Vec<Diagnostic>> in SqlToRel as it seems to be the least disruptive path forward. The DatafusionResult type with WarningAccumulator pattern offers an interesting, more robust solution that we could consider for the longer term.

  3. Span Highlighting
    I've checked PR #1738 and found that the feature is available in sqlparser version 0.55.0. However, the main codebase can't be upgraded to this version directly at the moment. There's a WIP PR(chore(deps): Update sqlparser to 0.55.0 #15183) for this dependency upgrade, so I believe this won't be a problem soon.

Thanks again for your guidance. I'll start working on an implementation based on these suggestions. Appreciate your patience as I navigate the codebase!

@changsun20
Copy link
Contributor

Quick update: I was a bit busy with schoolwork last week, but I’ll try to fix this ticket this week. Thanks for your patience!

@eliaperantoni
Copy link
Contributor Author

Quick update: I was a bit busy with schoolwork last week, but I’ll try to fix this ticket this week. Thanks for your patience!

No worries at all! <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants