Rearchitect and Reinstate GPU #114

onurulgen · 2024-08-29T13:05:59Z

Reinstated GPU into the project by fixing or implementing missing functions.
Rearchitected the project to combine GPU and CPU implementations by using virtual functions.
Increased the performance up to 11 times by using GPU.
Implemented unit and regression tests for GPU and CPU by using Catch2 framework.
Developed GitHub Actions for tests, coverage, static code analysis, and producing CPU and GPU executables for Windows, Linux and macOS.

…y management

Delete duplicate reg_io_ReadImageHeader() and add this ability to reg_io_ReadImageFile()

Don't own the image pointer if it's constructed by using a nifti_image pointer

This reverts commit 6cbbccd.

This reverts commit b9c9bec.

codecov · 2024-08-29T13:23:44Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

github-actions · 2024-08-29T13:26:21Z

✅Code Analysis Results - no issues found! ✅

msseibel · 2024-08-30T13:42:50Z

I had to add set(CMAKE_CUDA_ARCHITECTURES "native") to the CMakeLists.txt file. Otherwise, I got the following error, "CMAKE_CUDA_ARCHITECTURES must be non-empty if set".

Besides that, the build worked, and I was able to run the program.

onurulgen · 2024-08-30T13:51:31Z

I had to add set(CMAKE_CUDA_ARCHITECTURES "native") to the CMakeLists.txt file. Otherwise, I got the following error, "CMAKE_CUDA_ARCHITECTURES must be non-empty if set".

Besides that, the build worked, and I was able to run the program.

CMAKE_CUDA_ARCHITECTURES are set in here. If you need to set that, your cmake configure command must've failed while detecting CUDA, right?

msseibel · 2024-08-30T13:58:42Z

Yes, that is correct. It failed as soon as it reached line 156 enable_language(CUDA).

Copilot

Copilot reviewed 670 out of 685 changed files in this pull request and generated 1 comment.

Files not reviewed (15)

.github/workflows/linux.yml: Language not supported
.github/workflows/macos.yml: Language not supported
.github/workflows/windows.yml: Language not supported
CMakeLists.txt: Language not supported
Doxyfile.in: Language not supported
cmake/FindOPENCL.cmake: Language not supported
cmake/NIFTYREGConfig.cmake.in: Language not supported
niftyreg_build_version.txt: Language not supported
reg-apps/CMakeLists.txt: Language not supported
reg-apps/reg_average.cpp: Language not supported
reg-apps/reg_benchmark.cpp: Language not supported
reg-apps/reg_gpuinfo.cpp: Language not supported
reg-apps/reg_jacobian.cpp: Language not supported
reg-apps/reg_measure.cpp: Language not supported
reg-apps/reg_resample.cpp: Language not supported

Comments suppressed due to low confidence (1)

.github/workflows/tests.yml:32

The use of ${{ matrix.sudo }} might not be necessary for all platforms. Consider removing it or using a conditional statement.

${{ matrix.sudo }} cmake --build build/ --target install --config ${{ matrix.build_type }}

.github/workflows/tests.yml

reg-io/RNifti/NiftiImage_impl.h

+    if (image == nullptr)
+        throw std::runtime_error("Failed to read image from path " + path);
+
+    size_t brickSize = image->nbyper * image->nx * image->ny * image->nz;


To fix the problem, we need to ensure that the multiplication is performed using the larger type (size_t) to avoid overflow. This can be achieved by casting one of the operands to size_t before performing the multiplication. This way, the entire multiplication will be done in the size_t type, preventing overflow.

We will modify the line where the multiplication occurs to cast image->nbyper to size_t before the multiplication.

reg-io/png/reg_png.cpp

+    nifti_image *niiImage = nullptr;
+    if (readData) {
+
+        uch *image_data = static_cast<uch*>(malloc(width * height * channels * sizeof(uch)));


To fix the problem, we need to ensure that the multiplication is performed using a larger integer type to avoid overflow. This can be done by casting one of the operands to unsigned long before performing the multiplication. This way, the multiplication will be done using the larger type, and the result will be correctly represented.

The specific change involves casting one of the variables (width, height, or channels) to unsigned long before the multiplication. This change should be made on line 83 of the file reg-io/png/reg_png.cpp.

reg-lib/Optimiser.cpp

+    reduction(+:gg, dgg)
+#endif
+        for (i = 0; i < num; i++) {
+            gg += array2Ptr[i] * array1Ptr[i];


To fix the problem, we need to ensure that the multiplication is performed using the larger type (double) to avoid overflow. This can be achieved by casting the operands to double before performing the multiplication. Specifically, we need to update the lines where the multiplication occurs to cast array2Ptr[i] and array1Ptr[i] to double.

reg-lib/Optimiser.cpp

+#endif
+        for (i = 0; i < num; i++) {
+            gg += array2Ptr[i] * array1Ptr[i];
+            dgg += (gradientPtr[i] + array1Ptr[i]) * gradientPtr[i];


To fix the problem, we need to ensure that the multiplication is performed using the larger type double to avoid overflow. This can be done by casting the operands to double before performing the multiplication. This change should be made in the file reg-lib/Optimiser.cpp on line 278.

reg-lib/Optimiser.cpp

+    reduction(+:dggBw)
+#endif
+            for (i = 0; i < numBw; i++) {
+                ggBw += array2PtrBw[i] * array1PtrBw[i];


To fix the problem, we need to ensure that the multiplication is performed using double precision to avoid overflow. This can be achieved by casting one of the operands to double before performing the multiplication. Specifically, we will cast array2PtrBw[i] to double in the multiplication expression.

reg-lib/cpu/_reg_blockMatching.cpp

+      this->totalBlock[i] = in->totalBlock[i];
+
+   this->referencePosition = (float *)malloc(this->activeBlockNumber * this->dim * sizeof(float));
+   this->warpedPosition = (float *)malloc(this->activeBlockNumber * this->dim * sizeof(float));


To fix the problem, we need to ensure that the multiplication is performed using a larger integer type to avoid overflow. This can be achieved by casting one of the operands to size_t before performing the multiplication. This way, the multiplication will be done using the larger type, and the result will be correctly handled by the malloc function.

Specifically, we will cast this->activeBlockNumber to size_t before multiplying it by this->dim on lines 35 and 36. This change will ensure that the multiplication is performed using the size_t type, preventing any potential overflow.

reg-lib/cpu/_reg_blockMatching.cpp

         //params->activeBlock = (int *)malloc(params->activeBlockNumber * sizeof(int));
-         params->referencePosition = (float *)malloc(params->activeBlockNumber * params->dim * sizeof(float));
+   params->referencePosition = (float *)malloc(params->activeBlockNumber * params->dim * sizeof(float));


To fix the problem, we need to cast the operands of the multiplication to a larger type before performing the multiplication. This will ensure that the multiplication is done using the larger type, preventing overflow. Specifically, we should cast params->activeBlockNumber and params->dim to size_t before multiplying them. This change should be made on line 269 in the file reg-lib/cpu/_reg_blockMatching.cpp.

reg-lib/cpu/_reg_globalTrans.cpp

               {
                  voxel[0]= (double) deformationFieldPtrX[index];
                  voxel[1]= (double) deformationFieldPtrY[index];
                  voxel[2]= (double) deformationFieldPtrZ[index];
               }
-               reg_mat44_mul(&transformationMatrix, voxel, position);
+               Mat44Mul(transformationMatrix, voxel, position);


To fix the problem, we need to ensure that the multiplication is performed using a larger integer type to avoid overflow. We can achieve this by casting one of the operands to size_t before performing the multiplication. This will ensure that the multiplication is done using the larger type, preventing overflow.

Cast one of the operands to size_t before performing the multiplication.

Specifically, cast z to size_t on line 106.

No additional methods, imports, or definitions are needed to implement this change.

reg-lib/cpu/_reg_nmi.cpp

+    const double *logHistoPtr = jointHistogramLog[currentTimePoint];
+    const double *entropyPtr = entropyValues[currentTimePoint];
+    const double nmi = (entropyPtr[0] + entropyPtr[1]) / entropyPtr[2];
+    const size_t referenceOffset = referenceBinNumber[currentTimePoint] * floatingBinNumber[currentTimePoint];


To fix the problem, we need to ensure that the multiplication is performed using the larger type (size_t) to prevent overflow. This can be achieved by casting one of the operands to size_t before performing the multiplication. This way, the multiplication will be done using the larger type, and the result will be correctly stored in the size_t variable.

The specific change involves casting referenceBinNumber[currentTimePoint] to size_t before multiplying it with floatingBinNumber[currentTimePoint].

reg-lib/cpu/_reg_nmi.cpp

+    const double *logHistoPtr = jointHistogramLog[currentTimePoint];
+    const double *entropyPtr = entropyValues[currentTimePoint];
+    const double nmi = (entropyPtr[0] + entropyPtr[1]) / entropyPtr[2];
+    const size_t referenceOffset = referenceBinNumber[currentTimePoint] * floatingBinNumber[currentTimePoint];


To fix the problem, we need to ensure that the multiplication is performed using a larger integer type to prevent overflow. This can be achieved by casting the operands to size_t before performing the multiplication. This way, the multiplication will be done using the larger type, and the result will be correctly assigned to the size_t variable.

onurulgen added 30 commits March 7, 2023 15:30

Refactorise NiftiImage

4015cbf

Add utility functions to NiftiImage

5bf6ca8

Fix dimensions after initialisation of NiftiImage

a3b0cc9

Fix a bug causing accessing freed memory in reg_io_WriteImageFile()

c863946

Refactorisations

43686ab

Refactor reg_createControlPointGrid() using automatic memory management

61de023

Refactor reg_createSymmetricControlPointGrids() using automatic memor…

001d498

…y management

Refactor reg_createImagePyramid() using automatic memory management

058d4e9

Refactor reg_createMaskPyramid() using automatic memory management

0c1e715

Refactor reg_aladin class using automatic memory management

3a6d10c

Refactor reg_aladin_sym class using automatic memory management

1130e1f

Refactor reg_aladin app using automatic memory management

591fa91

Refactor reg_base class using automatic memory management

4e5db2c

Refactor reg_f3d class using automatic memory management

099572c

Refactor reg_f3d2 class using automatic memory management

76c6652

Refactor reg_f3d app using automatic memory management

ea8fac0

Refactor reg_test_imageGradient using NiftiImage

d9bc22b

Small fixes

4f22230

Add NiftiImage::disown() to release the wrapped pointer

876a88d

Use NiftiImage::disown() in reg_test_imageGradient

495ce95

Refactor reg_test_interpolation using NiftiImage

8f96921

Add NiftiImage::realloc() to reallocate the image data

751f447

Add NiftiImage::setDim() to set a dimension of the image

19883fb

Add ability to NiftiImage for copying only image info

d13cf2d

Update tests to leverage new abilities of NiftiImage

5f92c68

Add an enum for NiftiImage dimensions

4947c2e

Small fixes

d087265

Add ability to NiftiImageData for extracting volume data

379c8f9

Handle optimise* variables in Compute::NormaliseGradient()

4061036

Refactorisations

b226687

onurulgen added 11 commits June 2, 2024 00:37

Refactorisations

43c39fa

Update macOS images in GitHub Actions

7cfe146

Update the OS image for Coverage

41cd5a1

Fix linting issues

65934f2

Use NiftiImage instead of nifti_image in _reg_ReadWriteImage

dd78e8e

Delete duplicate reg_io_ReadImageHeader() and add this ability to reg_io_ReadImageFile()

Change the image acquisition behaviour of NiftiImage

cbbdd00

Don't own the image pointer if it's constructed by using a nifti_image pointer

Revert "Enable CRT secure warnings"

1cea3df

This reverts commit 6cbbccd.

Fix *Content::CastImageData() and eliminate its duplicates

2915900

Use NiftiImage in Content classes

d515493

Revert "Use float gam instead of double in CudaOptimiser"

7e1e926

This reverts commit b9c9bec.

Fix Floor(), Ceil(), and Round() functions

a148f14

onurulgen self-assigned this Aug 29, 2024

onurulgen linked an issue Aug 29, 2024 that may be closed by this pull request

Rearchitect F3D to reinstate GPU #92

Open

Merge branch 'master' into rearchitect-and-reinstate-gpu

da1da73

onurulgen requested review from mmodat and ericspod August 29, 2024 13:12

onurulgen requested a review from Copilot February 12, 2025 16:20

Copilot AI reviewed Feb 12, 2025

View reviewed changes

.github/workflows/tests.yml Show resolved Hide resolved

onurulgen added 5 commits February 28, 2025 15:01

Fix typos

31d42d2

Fix performance bugs

a4496ae

Don't set build type for coverage but warn

f280b81

Use Git commit hash as the build number

ed3d1ba

Remove build type checks for non-MSVC environments

591c7cf

github-advanced-security bot found potential problems Feb 28, 2025

View reviewed changes

Update macOS version in CI/CD workflows to macOS 13

7248e60

@@ -1298,3 +1298,3 @@
-                size_t brickSize = image->nbyper * image->nx * image->ny * image->nz;
+                size_t brickSize = static_cast<size_t>(image->nbyper) * image->nx * image->ny * image->nz;
                 image->data = calloc(1, nifti_get_volsize(image));
@@ -1310,3 +1310,3 @@
-                size_t brickSize = image->nbyper * image->nx * image->ny * image->nz;
+                size_t brickSize = static_cast<size_t>(image->nbyper) * image->nx * image->ny * image->nz;
                 image->data = calloc(1, nifti2_get_volsize(image));

@@ -82,3 +82,3 @@
-                    uch *image_data = static_cast<uch*>(malloc(width * height * channels * sizeof(uch)));
+                    uch *image_data = static_cast<uch*>(malloc(static_cast<unsigned long>(width) * height * channels * sizeof(uch)));
                     if (image_data == nullptr)

@@ -276,4 +276,4 @@
                     for (i = 0; i < num; i++) {
-                        gg += array2Ptr[i] * array1Ptr[i];
-                        dgg += (gradientPtr[i] + array1Ptr[i]) * gradientPtr[i];
+                        gg += static_cast<double>(array2Ptr[i]) * static_cast<double>(array1Ptr[i]);
+                        dgg += (static_cast<double>(gradientPtr[i]) + static_cast<double>(array1Ptr[i])) * static_cast<double>(gradientPtr[i]);
                     }
@@ -290,4 +290,4 @@
                         for (i = 0; i < numBw; i++) {
-                            ggBw += array2PtrBw[i] * array1PtrBw[i];
-                            dggBw += (gradientPtrBw[i] + array1PtrBw[i]) * gradientPtrBw[i];
+                            ggBw += static_cast<double>(array2PtrBw[i]) * static_cast<double>(array1PtrBw[i]);
+                            dggBw += (static_cast<double>(gradientPtrBw[i]) + static_cast<double>(array1PtrBw[i])) * static_cast<double>(gradientPtrBw[i]);
                         }

@@ -290,3 +290,3 @@
                         for (i = 0; i < numBw; i++) {
-                            ggBw += array2PtrBw[i] * array1PtrBw[i];
+                            ggBw += static_cast<double>(array2PtrBw[i]) * array1PtrBw[i];
                             dggBw += (gradientPtrBw[i] + array1PtrBw[i]) * gradientPtrBw[i];

@@ -34,4 +34,4 @@
-               this->referencePosition = (float *)malloc(this->activeBlockNumber * this->dim * sizeof(float));
-               this->warpedPosition = (float *)malloc(this->activeBlockNumber * this->dim * sizeof(float));
+               this->referencePosition = (float *)malloc((size_t)this->activeBlockNumber * this->dim * sizeof(float));
+               this->warpedPosition = (float *)malloc((size_t)this->activeBlockNumber * this->dim * sizeof(float));
                for(int i=0; i<this->activeBlockNumber*this->dim ; ++i){

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rearchitect and Reinstate GPU #114

Rearchitect and Reinstate GPU #114

onurulgen commented Aug 29, 2024

codecov bot commented Aug 29, 2024

github-actions bot commented Aug 29, 2024

msseibel commented Aug 30, 2024

onurulgen commented Aug 30, 2024

msseibel commented Aug 30, 2024

Copilot AI left a comment

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

@@ -268,4 +268,4 @@
                      //params->activeBlock = (int *)malloc(params->activeBlockNumber * sizeof(int));
-               params->referencePosition = (float *)malloc(params->activeBlockNumber * params->dim * sizeof(float));
-               params->warpedPosition = (float *)malloc(params->activeBlockNumber * params->dim * sizeof(float));
+               params->referencePosition = (float *)malloc((size_t)params->activeBlockNumber * (size_t)params->dim * sizeof(float));
+               params->warpedPosition = (float *)malloc((size_t)params->activeBlockNumber * (size_t)params->dim * sizeof(float));

@@ -105,3 +105,3 @@
                {
-                  index=z*deformationFieldImage->nx*deformationFieldImage->ny;
+                  index=(size_t)z*deformationFieldImage->nx*deformationFieldImage->ny;
                   voxel[2]=(double) z;

@@ -427,3 +427,3 @@
                 const double nmi = (entropyPtr[0] + entropyPtr[1]) / entropyPtr[2];
-                const size_t referenceOffset = referenceBinNumber[currentTimePoint] * floatingBinNumber[currentTimePoint];
+                const size_t referenceOffset = static_cast<size_t>(referenceBinNumber[currentTimePoint]) * floatingBinNumber[currentTimePoint];
                 const size_t floatingOffset = referenceOffset + referenceBinNumber[currentTimePoint];

Rearchitect and Reinstate GPU #114

Are you sure you want to change the base?

Rearchitect and Reinstate GPU #114

Conversation

onurulgen commented Aug 29, 2024

codecov bot commented Aug 29, 2024

Welcome to Codecov 🎉

github-actions bot commented Aug 29, 2024

✅Code Analysis Results - no issues found! ✅

msseibel commented Aug 30, 2024

onurulgen commented Aug 30, 2024

msseibel commented Aug 30, 2024

Copilot AI left a comment

Choose a reason for hiding this comment