Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up a bunch of compiler warnings #5

Merged
merged 9 commits into from
Jan 29, 2025

Conversation

GMNGeoffrey
Copy link
Collaborator

@GMNGeoffrey GMNGeoffrey commented Jan 17, 2025

Maybe these only show up with clang-19, but they made it really hard to look for actual problems. Hopefully this is something we could upstream. The only one that I think we really do need for hipification is expansion-to-defined, since that is actually undefined behavior.

I handled warnings in the following ways;

  1. Where the warning was from within the project and there was an obvious simple fix, I modified the source code:
    • -Wexpansion-to-defined
    • -Wunused
    • -Wmismatched-tags
    • -Winconsistent-missing-override
    • -Wparentheses
    • -Wcuda-compat
  2. Where the warning was in a third_party library and a relatively harmless one to have in a third_party I silenced the warning for that library:
    • -Wunused
    • -Wabsolute-value
    • -Wimplicit-exception-spec-mismatch
    • -Wdeprecated
  3. Explicit template instantiations are widespread across the project and look intentional, so I silenced the warning project-wide.
    • -Winstantiation-after-specialization
  4. There are a few warnings that I think should probably be fixed in the project, but are are not egregious and aren't really relevant for our work, so I just silenced them in my own CMakePresets.json.
    • -Wunneeded-internal-declaration
    • -Wdeprecated
    • -Wvla-cxx-extension
  5. "no case matching constant switch condition" can't be turned off, so I updated the third_party source (which is copied in-tree anyway). Maybe we can upstream that at some point.
  6. There's one other warning that looks like it could be related to hipification. Now we can actually see it! Based on discussions like https://gitlab.com/libeigen/eigen/-/issues/1751, I'm not sure if these are a problem or not or how to best handle it, so I haven't fixed it.
    • -Wpass-failed=transform-warning

I can now build with -Werror:

ROCM_PATH="${HOME}/rocm/opt/rocm" \
CC=/usr/bin/clang-19 \
CXX=/usr/bin/clang++-19 \
/usr/local/bin/cmake \
  -DUSE_ROCM=ON \
  -DCMAKE_PREFIX_PATH=/home/gcmn/rocm/opt/rocm \
  "-DCMAKE_HIP_FLAGS=-Wno-deprecated -Wno-pass-failed -Werror" \
  -DCMAKE_POSITION_INDEPENDENT_CODE=ON \
  -DROCM_WARN_TOOLCHAIN_VAR=OFF \
  -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
  "-DCMAKE_C_FLAGS=-Wno-deprecated -Wno-unneeded-internal-declaration -Wno-vla-cxx-extension -Werror" \
  "-DCMAKE_CXX_FLAGS=-Wno-deprecated -Wno-unneeded-internal-declaration -Wno-vla-cxx-extension -Werror" \
  -DCMAKE_INSTALL_PREFIX=/home/gcmn/src/dgl/out/install/rocm \
  -DCMAKE_C_COMPILER=/usr/bin/clang-19 \
  -DCMAKE_CXX_COMPILER=/usr/bin/clang++-19 \
  -DCMAKE_C_COMPILER_LAUNCHER=ccache \
  -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
  -DCMAKE_BUILD_TYPE=Debug \
  -DBUILD_TYPE=dev \
  -DBUILD_CPP_TEST=ON \
  -DBUILD_GRAPHBOLT=OFF \
  -DBUILD_SPARSE=ON \
  -DUSE_LIBXSMM=OFF \
  -DUSE_OPENMP=ON \
  -DPython3_FIND_VIRTUALENV=ONLY \
  -S/home/gcmn/src/dgl \
  -B/home/gcmn/src/dgl/out/build/rocm \
  -G Ninja

GMNGeoffrey and others added 8 commits January 16, 2025 13:59
This is just the output of the hipify-inplace.sh and
hipify-tensoradapter.py scripts with no further modifications. I think
it's easier to review changes to the actual HIP source rather than
trying to thing about what the hipify script will do and since there are
a fair number of changes to review, that seems worth it. In the end we
should have a bunch of hip source files and .prehip cuda files that they
can be generated from. Then we can handle organization however we want:
restore the originals and have hipification be part of a build process,
have the hip versions on a separate branch, etc.

I'll check in the .prehip files in a separate commit to keep things a
bit cleaner.
These all get the .prehip extension appended.
In my porting, I was finding it really annoying that everything in DGL
was hardcoded to the directory build/ and that it created build/
directories in various source subdirectories (which meant cleaning and
rebuilding was fraught). I modified things so that all sub-builds happen
in the main build directory.

There were also some bugs in the shell scripts and I cleaned them up a
bit to make them more robust.

Not all of this is strictly required for ROCM to build, so we might want
to strip it out. I already stripped out various warning silencing for
that reason.
This is what I get for not copying exactly what I had.
There are definitely still runtime issues, but this handles all of the
compilation failures (requires clang-19 and bleeding-edge ROCM). Mostly
straightforward. The ones that aren't.

- The AtomicFPOp was adapated from
[PyTorch](https://github.com/pytorch/pytorch/blob/bef103934a25d848838a7642a8d6a2f523e7dfff/aten/src/ATen/cuda/Atomic.cuh#L39).
- The handling of legacy cusparse is something I messed up originally.
  The various CUDA version checks need to be in sync because the legacy
  version creates a transposed output that then needs to be flipped
  back, so I factored out a shared macro.
- There is something weird where HIP doesn't like the logging function.
  I was never able to get it to work, but it doesn't seem like hugely
  important, so I think it's ok to punt for now.

A couple of these changes might not be strictly necessary to make the
build work (like adding `__HIP_DEVICE_COMPILE__` to some of the
`__CUDA_ARCH__` checks) because I grabbed changes from my fully working
draft by file and just reverted the obviously more complicated ones. It
didn't seem worth reverting these uncomplicated ones.
Maybe these only show up with clang-19, but they made it really hard to
look for actual problems. Hopefully this is something we could upstream.
The only one that I think we really do need for hipification is
`expansion-to-defined`, since that is actually undefined behavior.

I handled warnings in one of these ways;
1. Where the warning was from within the project and there was an
   obvious simple fix, I modified the source code:
     -  `-Wexpansion-to-defined`
     -  `-Wunused`
     -  `-Wmismatched-tags`
     -  `-Winconsistent-missing-override`
     -  `-Wparentheses`
     -  `-Wcuda-compat`
2. Where the warning was in a third_party library and a relatively
   harmless one to have in a third_party I silenced the warning for that
   library:
     - `-Wunused`
     - `-Wabsolute-value`
     - `-Wimplicit-exception-spec-mismatch`
     - `-Wdeprecated`
3. Explicit template instantiations are widespread across the project
   and look intentional, so I silenced the warning project-wide.
     - -Winstantiation-after-specialization
4. There are a few warnings that I think should probably be fixed in the
   project, but are are not egregious and aren't really relevant for our
   work, so I just silenced them in my own CMakePresets.json.
     - `-Wunneeded-internal-declaration`
     - `-Wdeprecated`
     - `-Wvla-cxx-extension`
5. "no case matching constant switch condition" can't be turned off,
   so I updated the third_party source (which is copied in-tree anyway).
   Maybe we can upstream that at some point.
6. There's one other warning that looks like it could be related to
   hipification. Now we can actually see it! Based on discussions like
   https://gitlab.com/libeigen/eigen/-/issues/1751, I'm not sure if
   these are a problem or not or how to best handle it, so I haven't
   fixed it.
     - `-Wpass-failed=transform-warning`
@GMNGeoffrey
Copy link
Collaborator Author

@jeffdailiy PTAL. Based on previous PRs. Only 5e13b63 (#5) is part of this PR

@GMNGeoffrey GMNGeoffrey changed the base branch from master to hipify-inplace January 22, 2025 17:53
@GMNGeoffrey GMNGeoffrey requested a review from jeffdaily January 22, 2025 18:03
@@ -258,7 +258,7 @@ void LibraVertexCut(
}
}
}
delete cache;
delete[] cache;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooof, I like that compiler warnings caught this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray for clang! 😁

@GMNGeoffrey GMNGeoffrey merged commit f660893 into nod-ai:hipify-inplace Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants