Skip to content

[Clang] Force expressions with UO_Not to not be non-negative #126846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 10, 2025

Conversation

YutongZhuu
Copy link
Contributor

@YutongZhuu YutongZhuu commented Feb 12, 2025

Fix to issue #18878

This PR addresses the bug of not throwing warning for the following code:

int test13(unsigned a, int *b) {
        return a > ~(95 != *b); // expected-warning {{comparison of integers of different signs}}
}

However, in the original issue, a comment mentioned that negation, pre-increment, and pre-decrement operators are also incorrect in this case. For negation, I am not able to implement a trivial fix without failing other tests, I will research further and submit another PR for that if necessary. For pre-increment, and pre-decrement, I don't think any fixes are required since errors will be triggered if that is the case.

Edit: I was able to find a fix for the UO_Minus operation and included the fix in this PR too.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Feb 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 12, 2025

@llvm/pr-subscribers-clang

Author: Yutong Zhu (YutongZhuu)

Changes

Fix to issue #18878

This PR issues the bug of not throwing warning for the following code:

int test13(unsigned a, int *b) {
        return a > ~(95 != *b); // expected-warning {{comparison of integers of different signs}}
}

However, in the original issue, a comment mentioned that negation, pre-increment, and pre-decrement operators are also incorrect in this case. For negation, I am not able to implement a trivial fix without failing other tests, I will research further and submit another PR for that if necessary. For pre-increment, and pre-decrement, I don't think any fixes are required since errors will be triggered if that is the case.


Full diff: https://git.1-hub.cnllvm/llvm-project/pull/126846.diff

3 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+13-1)
  • (modified) clang/test/Sema/compare.c (+5)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 6344c4b36e357..e8aeaa6e514e4 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -135,6 +135,8 @@ Improvements to Clang's diagnostics
 - Fixed a bug where Clang's Analysis did not correctly model the destructor behavior of ``union`` members (#GH119415).
 - A statement attribute applied to a ``case`` label no longer suppresses
   'bypassing variable initialization' diagnostics (#84072).
+- The ``-Wsign-compare`` warning now treats expressions with bitwise NOT(~) as signed integers 
+  and throws warning if they are compared with unsigned integers (##18878).
 
 Improvements to Clang's time-trace
 ----------------------------------
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 66c233de4ef30..37da477637c00 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -10069,6 +10069,16 @@ static std::optional<IntRange> TryGetExprRange(ASTContext &C, const Expr *E,
     case UO_AddrOf: // should be impossible
       return IntRange::forValueOfType(C, GetExprType(E));
 
+    case UO_Not:{
+      std::optional<IntRange> SubRange = TryGetExprRange(C, UO->getSubExpr(), MaxWidth, InConstantContext,
+                             Approximate);
+      if (!SubRange) 
+        return std::nullopt;
+      
+      // The width increments by 1 if the sub-expression cannot be negative since it now can be.
+      return IntRange(SubRange->Width + (int)SubRange->NonNegative, false);
+    }
+    
     default:
       return TryGetExprRange(C, UO->getSubExpr(), MaxWidth, InConstantContext,
                              Approximate);
@@ -10543,6 +10553,8 @@ static void AnalyzeComparison(Sema &S, BinaryOperator *E) {
   // Check to see if one of the (unmodified) operands is of different
   // signedness.
   Expr *signedOperand, *unsignedOperand;
+  llvm::outs() << LHS->getType()->hasSignedIntegerRepresentation() << "\n";
+  llvm::outs() << RHS->getType()->hasSignedIntegerRepresentation() << "\n";
   if (LHS->getType()->hasSignedIntegerRepresentation()) {
     assert(!RHS->getType()->hasSignedIntegerRepresentation() &&
            "unsigned comparison between two signed integer expressions?");
@@ -10561,7 +10573,7 @@ static void AnalyzeComparison(Sema &S, BinaryOperator *E) {
                       /*Approximate=*/true);
   if (!signedRange)
     return;
-
+  llvm::outs() << signedRange->NonNegative << "\n";
   // Go ahead and analyze implicit conversions in the operands.  Note
   // that we skip the implicit conversions on both sides.
   AnalyzeImplicitConversions(S, LHS, E->getOperatorLoc());
diff --git a/clang/test/Sema/compare.c b/clang/test/Sema/compare.c
index 17cf0351ef4f5..7abd6fb35976c 100644
--- a/clang/test/Sema/compare.c
+++ b/clang/test/Sema/compare.c
@@ -419,3 +419,8 @@ void pr36008(enum PR36008EnumTest lhs) {
   if (x == y) x = y; // no warning
   if (y == x) y = x; // no warning
 }
+
+
+int test13(unsigned a, int *b) {
+        return a > ~(95 != *b); // expected-warning {{comparison of integers of different signs}}
+}

Copy link

github-actions bot commented Feb 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@YutongZhuu YutongZhuu force-pushed the dev_issue_18878 branch 2 times, most recently from 099a872 to dd3f28f Compare February 12, 2025 04:04
@YutongZhuu YutongZhuu changed the title Force expressions with UO_Not to not be non-negative [Clang] Force expressions with UO_Not to not be non-negative Feb 12, 2025
@YutongZhuu YutongZhuu force-pushed the dev_issue_18878 branch 2 times, most recently from d6d2e4d to 3dfac14 Compare March 2, 2025 19:24
Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code side of this looks good to me, but I think we should have more tests to make sure we're doing the right thing in the various corner cases (unsigned operations, negation of the most-negative value, etc).

@YutongZhuu
Copy link
Contributor Author

YutongZhuu commented Mar 3, 2025

The code side of this looks good to me, but I think we should have more tests to make sure we're doing the right thing in the various corner cases (unsigned operations, negation of the most-negative value, etc).

Added, are there anything else you would like add?

@zygoloid
Copy link
Collaborator

zygoloid commented Mar 3, 2025

The code side of this looks good to me, but I think we should have more tests to make sure we're doing the right thing in the various corner cases (unsigned operations, negation of the most-negative value, etc).

Added, are there anything else you would like add?

We should have some tests that check we're properly computing the IntRange. Eg:

bool f(int n) {
  // Should not warn. (Does warn with trunk.)
  return -(n & 15) <= -15;
}

bool g(int n) {
  // Should continue to warn.
  return -(n & 15) <= -17;
}

(The -16 case would ideally warn, but because we're only tracking active bits and not ranges, we can't tell that.)

Similarly:

bool f(short n) {
  // Should not warn, currently does.
  return -n == 32768;
}

bool g(short n) {
  // Should continue to warn.
  return -n == 65536;
}

(And again, cases between these two would ideally warn, but won't.)

And analogous cases for ~. Generally we want to have tests that exercise the behavior on both "sides" of where we switch between not warning and warning.

@YutongZhuu
Copy link
Contributor Author

The code side of this looks good to me, but I think we should have more tests to make sure we're doing the right thing in the various corner cases (unsigned operations, negation of the most-negative value, etc).

Added, are there anything else you would like add?

We should have some tests that check we're properly computing the IntRange. Eg:

bool f(int n) {
  // Should not warn. (Does warn with trunk.)
  return -(n & 15) <= -15;
}

bool g(int n) {
  // Should continue to warn.
  return -(n & 15) <= -17;
}

(The -16 case would ideally warn, but because we're only tracking active bits and not ranges, we can't tell that.)

Similarly:

bool f(short n) {
  // Should not warn, currently does.
  return -n == 32768;
}

bool g(short n) {
  // Should continue to warn.
  return -n == 65536;
}

(And again, cases between these two would ideally warn, but won't.)

And analogous cases for ~. Generally we want to have tests that exercise the behavior on both "sides" of where we switch between not warning and warning.

Added more tests, thanks for your patience!

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@YutongZhuu YutongZhuu requested a review from zygoloid March 7, 2025 02:53
@YutongZhuu
Copy link
Contributor Author

Sorry, I accidentally requested for a review. Did not mean it.

@cor3ntin
Copy link
Contributor

cor3ntin commented Mar 7, 2025

@YutongZhuu Will you need someone to merge this for you?

@cor3ntin cor3ntin merged commit 773e88f into llvm:main Mar 10, 2025
12 checks passed
@jplehr
Copy link
Contributor

jplehr commented Mar 10, 2025

Hi, I think this broke two of our buildbots -- for reasons I don't quite understand, but reverting fixed the issue locally.
Can you please take a look at it?

https://lab.llvm.org/buildbot/#/builders/10/builds/980
https://lab.llvm.org/buildbot/#/builders/73/builds/14304

@jplehr
Copy link
Contributor

jplehr commented Mar 10, 2025

Hi, I think this broke two of our buildbots -- for reasons I don't quite understand, but reverting fixed the issue locally. Can you please take a look at it?

https://lab.llvm.org/buildbot/#/builders/10/builds/980 https://lab.llvm.org/buildbot/#/builders/73/builds/14304

I realized that libc compiles with -Werror and it appears that ours are the only libc bots that are affected.
So, I guess, the issue is on our end. Thank you.

jplehr added a commit to jplehr/llvm-project that referenced this pull request Mar 10, 2025
This fixes a build issue on the AMDGPU libc bot after llvm#126846
landed that introduced a warning.
jplehr added a commit that referenced this pull request Mar 10, 2025
This fixes a build issue on the AMDGPU libc bot after
#126846 landed that introduced
a warning.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
jplehr added a commit to jplehr/llvm-project that referenced this pull request Mar 10, 2025
Those two instanced were also flagged with a warning after
llvm#126846 landed.

I believe using type T here makes sense, but I can very well be wrong.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 10, 2025
This fixes a build issue on the AMDGPU libc bot after
llvm/llvm-project#126846 landed that introduced
a warning.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
@glandium
Copy link
Contributor

This had the side effect of adding implicit-int-conversion warnings on e.g. the following code:

unsigned char foo(unsigned char x)
{
    return ~x;
}

This seems correct, but this should probably be highlighted in the release notes.

Another example is:

unsigned int foo(unsigned char x)
{
    return ~(1<<x);
}

@YutongZhuu
Copy link
Contributor Author

This had the side effect of adding implicit-int-conversion warnings on e.g. the following code:

unsigned char foo(unsigned char x)
{
    return ~x;
}

This seems correct, but this should probably be highlighted in the release notes.

Another example is:

unsigned int foo(unsigned char x)
{
    return ~(1<<x);
}

Hi, could you edit the release note for me? I don't have write access, so I would have to submit a PR to do so. I don't think a PR with changes only in the release notes is a great idea.

@erichkeane
Copy link
Collaborator

This had the side effect of adding implicit-int-conversion warnings on e.g. the following code:

unsigned char foo(unsigned char x)
{
    return ~x;
}

This seems correct, but this should probably be highlighted in the release notes.
Another example is:

unsigned int foo(unsigned char x)
{
    return ~(1<<x);
}

Hi, could you edit the release note for me? I don't have write access, so I would have to submit a PR to do so. I don't think a PR with changes only in the release notes is a great idea.

Once you're merged there is no way to edit the release note other than a PR. So whether or not it is a great idea, it is the only way to go :) Please submit a PR to update it and we can all review the wording

VishMCW pushed a commit to VishMCW/llvm-project that referenced this pull request Mar 18, 2025
This fixes a build issue on the AMDGPU libc bot after
llvm#126846 landed that introduced
a warning.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
@zeroomega
Copy link
Contributor

I think this change might went too far. We are seeing this check flagging code like this:

error: implicit conversion loses integer precision: 'int' to 'int8_t' (aka 'signed char') [-Werror,-Wimplicit-int-conversion]:
int8_t shift = ...
...
shift = -shift;
      ~ ^~~~~~

While technically this is an implicit int conversion, as '-' promote value of 'shift' from int8_t to int and the expression reassigns it back to int8_t. But the implicit conversion here has no side effects and it is a common practice. We appreciate the fact that clang now flagging cases like the one in the summary, but flagging/fixing the cases like the one above seems to be unnecessary. Would it be possible to provide better fine grained control over the -Wsign-compare so the clang could still flag the the cases like the one in the summary but ignore the cases where implicit conversion has no side effects or UB?

@YutongZhuu
Copy link
Contributor Author

I think this change might went too far. We are seeing this check flagging code like this:

error: implicit conversion loses integer precision: 'int' to 'int8_t' (aka 'signed char') [-Werror,-Wimplicit-int-conversion]:
int8_t shift = ...
...
shift = -shift;
      ~ ^~~~~~

While technically this is an implicit int conversion, as '-' promote value of 'shift' from int8_t to int and the expression reassigns it back to int8_t. But the implicit conversion here has no side effects and it is a common practice. We appreciate the fact that clang now flagging cases like the one in the summary, but flagging/fixing the cases like the one above seems to be unnecessary. Would it be possible to provide better fine grained control over the -Wsign-compare so the clang could still flag the the cases like the one in the summary but ignore the cases where implicit conversion has no side effects or UB?

I agree, I will look into this :)

@zygoloid
Copy link
Collaborator

Would it be possible to provide better fine grained control over the -Wsign-compare so the clang could still flag the the cases like the one in the summary but ignore the cases where implicit conversion has no side effects or UB?

The warning being produced is a -Wimplicit-int-conversion warning, not a -Wsign-compare warning, caused by us now noticing that if shift is -128, -shift will be (int)128, and the conversion back to int8_t changes the value. I could be wrong, but I don't think -Wimplicit-int-conversion flags any cases where the conversion has UB or side effects, so you could perhaps work around this by turning off that flag.

That said, it looks strange that this warns:

int8_t n = ...;
n = -n;

but this does not:

int8_t n = ...;
n = n + 1;

Both have the property that the right-hand side is sometimes 128, and if so, it gets implicitly wrapped to -128 by the conversion back to int8_t. Perhaps we should look at what + and - are doing and make -n behave the same as 0 - n?

@zeroomega
Copy link
Contributor

Would it be possible to provide better fine grained control over the -Wsign-compare so the clang could still flag the the cases like the one in the summary but ignore the cases where implicit conversion has no side effects or UB?

The warning being produced is a -Wimplicit-int-conversion warning, not a -Wsign-compare warning, caused by us now noticing that if shift is -128, -shift will be (int)128, and the conversion back to int8_t changes the value. I could be wrong, but I don't think -Wimplicit-int-conversion flags any cases where the conversion has UB or side effects, so you could perhaps work around this by turning off that flag.

That said, it looks strange that this warns:

int8_t n = ...;
n = -n;

but this does not:

int8_t n = ...;
n = n + 1;

Both have the property that the right-hand side is sometimes 128, and if so, it gets implicitly wrapped to -128 by the conversion back to int8_t. Perhaps we should look at what + and - are doing and make -n behave the same as 0 - n?

128 for int8_t is a valid point. Thanks for pointing that out.
But still I feel generate a warning for this case went too far. I did a comparison with gcc:

$ cat test_int.c
#include<inttypes.h>

int main() {
  int8_t a = 128;
  a = -a;
  uint8_t b = 0xFF;
  b = ~b;
  return 0;
}

$ gcc --version
gcc (Debian 14.2.0-3+build4) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ../clang/clang_tot/bin/clang --version
Fuchsia clang version 21.0.0git (https://llvm.googlesource.com/llvm-project 2e6402ca2c6c33ccf41d74383a8e3afb82489410)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /mnt/nvme_crypt/SRC/clang/clang_tot/bin
Build config: +assertions
$ 
$  ../clang/clang_tot/bin/clang -Wimplicit-int-conversion test_int.c -o test_int
test_int.c:4:14: warning: implicit conversion from 'int' to 'int8_t' (aka 'signed char') changes value from 128 to -128 [-Wconstant-conversion]
    4 |   int8_t a = 128;
      |          ~   ^~~
test_int.c:5:7: warning: implicit conversion loses integer precision: 'int' to 'int8_t' (aka 'signed char') [-Wimplicit-int-conversion]
    5 |   a = -a;
      |     ~ ^~
test_int.c:7:7: warning: implicit conversion loses integer precision: 'int' to 'uint8_t' (aka 'unsigned char') [-Wimplicit-int-conversion]
    7 |   b = ~b;
      |     ~ ^~
3 warnings generated.
$ gcc -Wimplicit-int-conversion test_int.c -o test_int
gcc: error: unrecognized command-line option ‘-Wimplicit-int-conversion’; did you mean ‘-Wno-int-conversion’?
$ gcc -Wconversion test_int.c -o test_int
test_int.c: In function ‘main’:
test_int.c:4:14: warning: conversion from ‘int’ to ‘int8_t’ {aka ‘signed char’} changes value from ‘128’ to ‘-128’ [-Wconversion]
    4 |   int8_t a = 128;
      |              ^~~
$ gcc -Wint-conversion test_int.c -o test_int
$ gcc -Wall -Wextra test_int.c -o test_int
➜ 

As you can see, gcc doesn't generate warnings for the cases I raised. Only the -Wconversion does warn when the value is explicitly assigned to 128. If this is the default behavior of -Wimplicit-int-conversion from now on, quite a lot of embedded project would need fixes (could be unnecessary) or disable -Wimplicit-int-conversion entirely (which will miss the cases that mentioned in #18878)

@zygoloid
Copy link
Collaborator

But still I feel generate a warning for this case went too far.

Yeah, that's probably right. Maybe for - on a signed operand, we should just return the original range with the NonNegative flag cleared out, and shouldn't add the extra bit for the -128 -> 128 edge case. That's not technically correct, but probably is more useful in practice.

@zygoloid
Copy link
Collaborator

But still I feel generate a warning for this case went too far.

Yeah, that's probably right. Maybe for - on a signed operand, we should just return the original range with the NonNegative flag cleared out, and shouldn't add the extra bit for the -128 -> 128 edge case. That's not technically correct, but probably is more useful in practice.

Hm. That change will reintroduce a false positive warning for:

bool b(signed char c) {
  return -c >= 128;
}

... that this patch fixed. But we don't produce a false positive for 0 - c >= 128, so I still think what we ought to do here is to make -c behave the same way that 0 - c does.

@ahatanak
Copy link
Collaborator

ahatanak commented Apr 10, 2025

unsigned int foo(unsigned char x)
{
return ~(1<<x); // warning: implicit conversion loses integer precision: 'int' to 'unsigned int'
}

Could you help me understand why clang is emitting the warning?

Isn't it semantically equivalent to the following code?

unsigned int foo(unsigned char x)
{
     int i = ~(1<<x); // no -Wimplicit-int-conversion warning.
     return i; // no -Wimplicit-int-conversion warning.
}

@YutongZhuu
Copy link
Contributor Author

But still I feel generate a warning for this case went too far.

Yeah, that's probably right. Maybe for - on a signed operand, we should just return the original range with the NonNegative flag cleared out, and shouldn't add the extra bit for the -128 -> 128 edge case. That's not technically correct, but probably is more useful in practice.

Hm. That change will reintroduce a false positive warning for:

bool b(signed char c) {
  return -c >= 128;
}

... that this patch fixed. But we don't produce a false positive for 0 - c >= 128, so I still think what we ought to do here is to make -c behave the same way that 0 - c does.

I don't think I can fix the problem without re-introducing the false positive in TryGetExprRange. By looking at the AST, the only difference is that

unsigned char b = -a 

has a implCast to unsigned char right before the unary operator but the false positive case does not. By the time that the expression is passed into TryGetExprRange, the information about the implCast has already lost (And I cannot find any APIs that is able to get the parent expression or something like that). I think it would be better to re-introduce the false positive because I think having the implicit-int-conversion warning sounds more unacceptable. Or you have any other suggestions on differentiating between the two cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants