Merge ~bullwinkle-team/ubuntu/+source/hipfft:bullwinkle/llvm-21/ubuntu/devel into ubuntu/+source/hipfft:ubuntu/devel
- Git
- lp:~bullwinkle-team/ubuntu/+source/hipfft
- bullwinkle/llvm-21/ubuntu/devel
- Merge into ubuntu/devel
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Approved by: | Andreas Hasenack | ||||
| Approved revision: | 9230b376984059cb3b468ca7c94659fcda3a16ca | ||||
| Merged at revision: | 9230b376984059cb3b468ca7c94659fcda3a16ca | ||||
| Proposed branch: | ~bullwinkle-team/ubuntu/+source/hipfft:bullwinkle/llvm-21/ubuntu/devel | ||||
| Merge into: | ubuntu/+source/hipfft:ubuntu/devel | ||||
| Diff against target: |
12215 lines (+8585/-849) 64 files modified
.github/CODEOWNERS (+2/-2) CHANGELOG.md (+21/-2) CMakeLists.txt (+10/-29) LICENSE.md (+3/-1) README.md (+23/-22) clients/CMakeLists.txt (+2/-2) clients/bench/CMakeLists.txt (+0/-16) clients/bench/bench.cpp (+8/-0) clients/hipfft_params.h (+551/-64) clients/hipfftw_helper.h (+1238/-0) clients/samples/CMakeLists.txt (+11/-4) clients/samples/hipfft_callback.cpp (+8/-0) clients/tests/CMakeLists.txt (+113/-22) clients/tests/accuracy_test_1D.cpp (+45/-35) clients/tests/accuracy_test_2D.cpp (+21/-0) clients/tests/accuracy_test_3D.cpp (+22/-0) clients/tests/accuracy_test_callback.cpp (+21/-16) clients/tests/gtest_main.cpp (+164/-143) clients/tests/hipfft_accuracy_test.cpp (+292/-65) clients/tests/hipfft_mpi_worker.cpp (+3/-0) clients/tests/hipfftw_test.cpp (+2214/-0) clients/tests/multi_device_test.cpp (+29/-11) clients/tests/multi_stream_test.cpp (+875/-0) clients/tests/simple_test.cpp (+114/-9) cmake/dependencies.cmake (+16/-25) debian/bin/bug/libhipfftw0 (+9/-0) debian/changelog (+39/-0) debian/control (+29/-12) debian/libhipfft-dev.install (+1/-0) debian/libhipfftw0.install (+2/-0) debian/libhipfftw0.lintian-overrides (+2/-0) debian/libhipfftw0.symbols.amd64 (+47/-0) debian/patches/add-so-version-for-libhipfftw0.patch (+50/-0) debian/patches/bump-rocm-docs-core-in-docs-sphinx.patch (+65/-0) debian/patches/fix-doxygen-refs.patch (+37/-32) debian/patches/series (+3/-3) debian/rules (+16/-8) debian/tests/control (+1/-1) dev/null (+0/-22) docs/doxygen/Doxyfile (+1/-1) docs/index.rst (+6/-2) docs/install/building-installing-hipfft.rst (+12/-30) docs/reference/fft-api-usage.rst (+20/-6) docs/sphinx/_toc.yml.in (+1/-1) docs/sphinx/requirements.in (+1/-1) docs/sphinx/requirements.txt (+135/-4) library/CMakeLists.txt (+52/-46) library/include/hipfft/hipfft.h (+39/-31) library/include/hipfft/hipfftw.h (+142/-0) library/src/CMakeLists.txt (+4/-1) library/src/amd_detail/hipfft.cpp (+22/-44) library/src/amd_detail/hipfftw.cpp (+1627/-0) library/src/nvidia_detail/hipfft.cpp (+2/-0) rtest.xml (+3/-0) shared/accuracy_test.h (+23/-9) shared/environment.h (+2/-2) shared/fft_params.h (+163/-22) shared/gpubuf.h (+14/-10) shared/hostbuf.h (+62/-34) shared/params_gen.h (+46/-17) shared/rocfft_params.h (+14/-10) shared/sys_mem.h (+81/-29) shared/test_params.h (+2/-0) toolchain-windows.cmake (+4/-3) |
||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Andreas Hasenack | Approve | ||
| Ubuntu Sponsors | Pending | ||
|
Review via email:
|
|||
Commit message
Description of the change
Update to new upstream version 7.1.0
| Bojan Aleksovski (b0b0a) wrote : | # |
| Bojan Aleksovski (b0b0a) wrote : | # |
`reverse-depends --arch ppc64el src:hipfft -x` returns No reverse dependencies found.
| Andreas Hasenack (ahasenack) wrote : | # |
--- a/debian/control
+++ b/debian/control
+Package: libhipfftw0
+Section: libs
+Architecture: amd64 arm64
+XB-X-ROCm-
+Depends: ${misc:Depends}, ${shlibs:Depends},
+Description: portable interface for Fast Fourier Transforms on the GPU - library
+ hipFFTW is a GPU-aware library for fast Fourier transforms using rocFFT as the
+ backend. It exports an interface borrowing the most commonly-used symbols of
+ FFTW. hipFFTW does not require its computational input and output to be
+ directly accessible by the GPU.
+ .
+ This package provides the AMD ROCm hipFFTW library.
Do you know if debian is going to add this same binary package, and also have libhipfft-dev pull in both libhipfft0 and this new libhipfftw0, like below?
Package: libhipfft-dev
Section: libdevel
-Architecture: amd64 arm64 ppc64el
-Depends: libhipfft0 (= ${binary:
- libamdhip64-dev,
+Architecture: amd64 arm64
+Depends: libhipfft0 (= ${binary:
+ ${misc:Depends}, ${shlibs:Depends}, libamdhip64-dev,
Dealing with differences in packaging is a very annoying delta do carry in Ubuntu specially if Debian decides on a different split.
| Andreas Hasenack (ahasenack) wrote : | # |
--- /dev/null
+++ b/debian/
@@ -0,0 +1,45 @@
+From: Bojan Aleksovski <email address hidden>
+Date: Wed, 23 Jan 2026 16:05:35 +0100
+Subject: Add versioning for hipfftw.so
+
+Summary: This patch is fully based on upstream commit
+dfa828c45abcbe
+additional fixes in the checking that hipfftw is not an alias
+
+Origin: upstream, https:/
+Bug: https:/
+Forwarded: not-needed
I see the difference:
-+ if (aliased_
++ if (NOT aliased_
Could you please elaborate in the patch header why this was necessary? It's better than just to say "tiny change" and leave us guessing where it was and why :)
And why forwarding to upstream is not needed? Maybe it will become clear after the reasoning, but not yet.
| Andreas Hasenack (ahasenack) wrote : | # |
--- a/debian/
+++ b/debian/
@@ -1,28 +1,19 @@
-From: Kari Pahula <email address hidden>
-Date: Fri, 28 Jun 2024 16:07:05 +0300
-Subject: doxygen-refs
+From: Bojan Aleksovski <email address hidden>
+Date: Wed, 20 Jan 2026 20:55:15 +0100
+Subject: fix doxygen refs
...
What's the nature of these changes? This diff looks like the whole patch is being rewritten, including authorship. If it's indeed necessary, I would prefer if the original patch is dropped, and a new one is added with a different name.
Alternatively, to keep the existing name and authorship, you could add something about the change below the description, and add the Last-Updated DEP3 header with an updated date indicating when this change was made. But at a glance it looks like the first option (delete old, add new with new name) would be clearer?
And is this submittable to upstream?
| Andreas Hasenack (ahasenack) wrote : | # |
--- a/debian/rules
+++ b/debian/rules
@@ -17,8 +24,7 @@ CMAKE_FLAGS = \
# The hipfft test client requires the rocfft sources to build.
ifeq (,$(filter nocheck,
-CMAKE_FLAGS += -DBUILD_
- -DUSE_HIPRAND=OFF
+CMAKE_FLAGS += -DBUILD_
endif
%:
I don't think this change is described in d/changelog, and why it's needed (remove USE_HIPRAND=OFF).
| Andreas Hasenack (ahasenack) wrote : | # |
d/changelog:
* debian: new binary package libhipfftw0
Nit: that should be "* d/control: new binary package libhipfftw0"
While at it, you should probably also add a line about adding libhipfftw0 to the list of dependencies of libhipfft-dev.
Also in d/changelog:
* d/control: fix hipfft-doc build
What in d/control fixed the hipfft-doc build? I think you meant d/rules?
| Bojan Aleksovski (b0b0a) wrote : | # |
Hello Andreas,
Regarding how the packages are organized, we took this direction after discussing with Cory in order to not have diff with Debian.
libhipfftw0 will be introduced as a new binary package.
Regarding the dev package, initially on the table was having separate hipfft and hipfftw dev packages but as they shared one header file we needed to introduce common-dev package that both would depend on. This option seemed far-fetched and it was agreed to leave the dev package as it is.
Regarding the tests package, on the table was renaming it to libhipfft-tests (dropping the 0 as the tests are for both bin pkgs), but as libhipfftw is basically a shim this seemed as not needed complexity as well.
Btw, this made me realize that I left out libhipfftw0 as a dependency of libhipfft0-tests, will fix that.
| Bojan Aleksovski (b0b0a) wrote : | # |
While I was writing the reasoning about how the packages are organized, I saw that you left the other notes. Thank you, I will provide reasoning and fixes soon. Have a great weekend!
| Bojan Aleksovski (b0b0a) wrote : | # |
Hello Andreas,
Please find the effort to have the comments fixed:
- Regarding add-so-
- Regarding doxygen-refs.patch, it has been dropped. A new fix-doxygen-
- Regarding the change that does `remove USE_HIPRAND=OFF`, it has been forgotten in the previous changelog entry 7.1.0-0ubuntu1. I've added it. That change refers to commit https:/
- Regarding the changelog entry for the new binary package libhipfftw0, it has been updated and a new line about having libhipfftw0 as a dependency for libhipfft-dev and libhipfft0-tests has been added
- The typo in the changelog about fix hipfft-doc build has been resolved
The easiest place to view the mentioned fixes is https:/
Thank you!
| Andreas Hasenack (ahasenack) wrote : | # |
I noticed there is no symbols file for arm64, just amd64, but that is not a bug being introduced in this PR. At some point it should be fixed, though.
+1
| Andreas Hasenack (ahasenack) wrote : | # |
Sponsored:
Uploading hipfft_
Uploading hipfft_
Uploading hipfft_
Uploading hipfft_
Uploading hipfft_
This has a binary NEW package, so it will require an archive admin review before it lands in proposed.
| Bojan Aleksovski (b0b0a) wrote : | # |
Thank you Andreas. Bug has been created regarding the arm64 symbols (using rocblas bug as reference), so it gets fixed soon: https:/
Preview Diff
| 1 | diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS |
| 2 | old mode 100755 |
| 3 | new mode 100644 |
| 4 | index eff670f..056a93f |
| 5 | --- a/.github/CODEOWNERS |
| 6 | +++ b/.github/CODEOWNERS |
| 7 | @@ -1,8 +1,8 @@ |
| 8 | -* @af-ayala @eng-flavio-teixeira @evetsso @feizheng10 @malcolmroberts |
| 9 | +* @af-ayala @eng-flavio-teixeira @evetsso @malcolmroberts @regan-amd |
| 10 | # Documentation files |
| 11 | docs/ @ROCm/rocm-documentation |
| 12 | *.md @ROCm/rocm-documentation |
| 13 | *.rst @ROCm/rocm-documentation |
| 14 | .readthedocs.yaml @ROCm/rocm-documentation |
| 15 | # Header directory for Doxygen documentation |
| 16 | -library/include/ @ROCm/rocm-documentation @af-ayala @eng-flavio-teixeira @evetsso @feizheng10 @malcolmroberts |
| 17 | +library/include/ @ROCm/rocm-documentation @af-ayala @eng-flavio-teixeira @evetsso @malcolmroberts @regan-amd |
| 18 | diff --git a/.jenkins/common.groovy b/.jenkins/common.groovy |
| 19 | deleted file mode 100644 |
| 20 | index e0aea53..0000000 |
| 21 | --- a/.jenkins/common.groovy |
| 22 | +++ /dev/null |
| 23 | @@ -1,92 +0,0 @@ |
| 24 | -import static groovy.io.FileType.FILES |
| 25 | - |
| 26 | -def runCompileCommand(platform, project, jobName, boolean sameOrg = false) |
| 27 | -{ |
| 28 | - project.paths.construct_build_prefix() |
| 29 | - |
| 30 | - def getDependenciesCommand = "" |
| 31 | - if (project.installLibraryDependenciesFromCI) |
| 32 | - { |
| 33 | - project.libraryDependencies.each |
| 34 | - { libraryName -> |
| 35 | - getDependenciesCommand += auxiliary.getLibrary(libraryName, platform.jenkinsLabel, null, sameOrg) |
| 36 | - } |
| 37 | - } |
| 38 | - |
| 39 | - String cmake = platform.jenkinsLabel.contains('centos') ? "cmake3" : "cmake" |
| 40 | - String warningArgs = platform.jenkinsLabel.contains('cuda') ? '':'-DWERROR=ON' |
| 41 | - String hipClang = platform.jenkinsLabel.contains('hipClang') ? "HIP_COMPILER=clang" : "" |
| 42 | - String path = platform.jenkinsLabel.contains('centos7') ? "export PATH=/opt/rh/devtoolset-7/root/usr/bin:$PATH" : ":" |
| 43 | - String dir = jobName.contains('Debug') ? "debug" : "release" |
| 44 | - |
| 45 | - // hipcc with CUDA backend needs HIP_PLATFORM set accordingly in the environment |
| 46 | - String hipPlatformCommand = platform.jenkinsLabel.contains("cuda") ? "export HIP_PLATFORM=nvidia" : "" |
| 47 | - |
| 48 | - def command = """#!/usr/bin/env bash |
| 49 | - set -x |
| 50 | - |
| 51 | - ls /fftw/lib |
| 52 | - export FFTW_ROOT=/fftw |
| 53 | - export FFTW_INCLUDE_PATH=\${FFTW_ROOT}/include |
| 54 | - export FFTW_LIB_PATH=\${FFTW_ROOT}/lib |
| 55 | - export LD_LIBRARY_PATH=\${FFTW_LIB_PATH}:/opt/rocm/lib:/opt/rocm/hip/lib |
| 56 | - export CPLUS_INCLUDE_PATH=\${FFTW_INCLUDE_PATH}:\${CPLUS_INCLUDE_PATH} |
| 57 | - export CMAKE_PREFIX_PATH=\${FFTW_LIB_PATH}/cmake/fftw3:\${CMAKE_PREFIX_PATH} |
| 58 | - export CMAKE_PREFIX_PATH=\${FFTW_LIB_PATH}/cmake/fftw3f:\${CMAKE_PREFIX_PATH} |
| 59 | - # default container flags cause problems for CUDA backend, and aren't useful for ROCm |
| 60 | - unset HIPCC_COMPILE_FLAGS_APPEND |
| 61 | - unset HIPCC_LINK_FLAGS_APPEND |
| 62 | - ${hipPlatformCommand} |
| 63 | - |
| 64 | - cd ${project.paths.project_build_prefix} |
| 65 | - mkdir -p build/${dir} && cd build/${dir} |
| 66 | - ${getDependenciesCommand} |
| 67 | - ${path} |
| 68 | - ${hipClang} ${cmake} ${warningArgs} ${project.paths.build_command} |
| 69 | - make -j\$(nproc) |
| 70 | - """ |
| 71 | - |
| 72 | - platform.runCommand(this, command) |
| 73 | -} |
| 74 | - |
| 75 | -def runTestCommand (platform, project, gfilter) |
| 76 | -{ |
| 77 | - String cudaArgs = platform.jenkinsLabel.contains('cuda') ? '--double_epsilon=5e-11' |
| 78 | - : '--precompile=rocfft-test-precompile.db' |
| 79 | - |
| 80 | - def command = """#!/usr/bin/env bash |
| 81 | - set -x |
| 82 | - cd ${project.paths.project_build_prefix}/build/release/clients/staging |
| 83 | - GTEST_LISTENER=NO_PASS_LINE_IN_LOG ./hipfft-test ${cudaArgs} --gtest_output=xml --gtest_color=yes --gtest_filter=${gfilter} |
| 84 | - """ |
| 85 | - |
| 86 | - platform.runCommand(this, command) |
| 87 | - //junit "${project.paths.project_build_prefix}/build/release/clients/staging/*.xml" |
| 88 | -} |
| 89 | - |
| 90 | -def runPackageCommand(platform, project, jobName, label='') |
| 91 | -{ |
| 92 | - def command |
| 93 | - |
| 94 | - label = label != '' ? '-' + label.toLowerCase() : '' |
| 95 | - String ext = platform.jenkinsLabel.contains('ubuntu') ? "deb" : "rpm" |
| 96 | - String dir = jobName.contains('Debug') ? "debug" : "release" |
| 97 | - |
| 98 | - command = """ |
| 99 | - set -x |
| 100 | - cd ${project.paths.project_build_prefix}/build/${dir} |
| 101 | - make package |
| 102 | - mkdir -p package |
| 103 | - for f in hipfft*.$ext |
| 104 | - do |
| 105 | - mv "\$f" "hipfft${label}-\${f#*-}" |
| 106 | - done |
| 107 | - mv *.${ext} package/ |
| 108 | - """ |
| 109 | - |
| 110 | - platform.runCommand(this, command) |
| 111 | - platform.archiveArtifacts(this, """${project.paths.project_build_prefix}/build/${dir}/package/*.${ext}""") |
| 112 | -} |
| 113 | - |
| 114 | - |
| 115 | -return this |
| 116 | diff --git a/.jenkins/debug.groovy b/.jenkins/debug.groovy |
| 117 | deleted file mode 100644 |
| 118 | index abac808..0000000 |
| 119 | --- a/.jenkins/debug.groovy |
| 120 | +++ /dev/null |
| 121 | @@ -1,83 +0,0 @@ |
| 122 | -#!/usr/bin/env groovy |
| 123 | -@Library('rocJenkins@pong') _ |
| 124 | - |
| 125 | -import com.amd.project.* |
| 126 | -import com.amd.docker.* |
| 127 | -import java.nio.file.Path |
| 128 | - |
| 129 | -def runCI = |
| 130 | -{ |
| 131 | - nodeDetails, jobName, buildCommand -> |
| 132 | - |
| 133 | - def prj = new rocProject('hipFFT', 'Debug') |
| 134 | - // customize for project |
| 135 | - prj.paths.build_command = buildCommand |
| 136 | - prj.libraryDependencies = ['rocRAND', 'rocFFT', 'hipRAND'] |
| 137 | - |
| 138 | - // Define test architectures, optional rocm version argument is available |
| 139 | - def nodes = new dockerNodes(nodeDetails, jobName, prj) |
| 140 | - |
| 141 | - def commonGroovy |
| 142 | - |
| 143 | - boolean formatCheck = false |
| 144 | - |
| 145 | - def compileCommand = |
| 146 | - { |
| 147 | - platform, project-> |
| 148 | - |
| 149 | - project.paths.construct_build_prefix() |
| 150 | - |
| 151 | - commonGroovy = load "${project.paths.project_src_prefix}/.jenkins/common.groovy" |
| 152 | - commonGroovy.runCompileCommand(platform, project, jobName) |
| 153 | - } |
| 154 | - |
| 155 | - buildProject(prj, formatCheck, nodes.dockerArray, compileCommand, null, null) |
| 156 | -} |
| 157 | - |
| 158 | -def setupCI(urlJobName, jobNameList, buildCommand, runCI, label) |
| 159 | -{ |
| 160 | - jobNameList = auxiliary.appendJobNameList(jobNameList) |
| 161 | - |
| 162 | - jobNameList.each |
| 163 | - { |
| 164 | - jobName, nodeDetails-> |
| 165 | - if (urlJobName == jobName) |
| 166 | - stage(label + ' ' + jobName) { |
| 167 | - runCI(nodeDetails, jobName, buildCommand, label) |
| 168 | - } |
| 169 | - } |
| 170 | - |
| 171 | - // For url job names that are not listed by the jobNameList i.e. compute-rocm-dkms-no-npi-1901 |
| 172 | - if(!jobNameList.keySet().contains(urlJobName)) |
| 173 | - { |
| 174 | - properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * *')])])) |
| 175 | - stage(label + ' ' + urlJobName) { |
| 176 | - runCI([ubuntu18:['gfx906']], urlJobName, buildCommand, label) |
| 177 | - } |
| 178 | - } |
| 179 | -} |
| 180 | - |
| 181 | -ci: { |
| 182 | - String urlJobName = auxiliary.getTopJobName(env.BUILD_URL) |
| 183 | - |
| 184 | - def propertyList = ["compute-rocm-dkms-no-npi":[pipelineTriggers([cron('0 1 * * 0')])]] |
| 185 | - |
| 186 | - propertyList = auxiliary.appendPropertyList(propertyList) |
| 187 | - |
| 188 | - propertyList.each |
| 189 | - { |
| 190 | - jobName, property-> |
| 191 | - if (urlJobName == jobName) |
| 192 | - properties(auxiliary.addCommonProperties(property)) |
| 193 | - } |
| 194 | - |
| 195 | - def hostJobNameList = ["compute-rocm-dkms-no-npi":([ubuntu18:['gfx900'],centos7:['gfx906'],sles15sp1:['gfx906']])] |
| 196 | - |
| 197 | - def hipClangJobNameList = ["compute-rocm-dkms-no-npi":([ubuntu18:['gfx900'],centos7:['gfx906'],sles15sp1:['gfx906']])] |
| 198 | - |
| 199 | - String hostBuildCommand = '-DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Debug -L ../..' |
| 200 | - String hipClangBuildCommand = '-DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ -DCMAKE_BUILD_TYPE=Debug -DBUILD_CLIENTS_TESTS=ON -DBUILD_CLIENTS_SAMPLES=ON -DBUILD_CLIENTS_SAMPLES=ON -L ../..' |
| 201 | - |
| 202 | - setupCI(urlJobName, hostJobNameList, hostBuildCommand, runCI, 'g++') |
| 203 | - setupCI(urlJobName, hipClangJobNameList, hipClangBuildCommand, runCI, 'hip-clang') |
| 204 | -} |
| 205 | diff --git a/.jenkins/multigpu.groovy b/.jenkins/multigpu.groovy |
| 206 | deleted file mode 100644 |
| 207 | index 9c315df..0000000 |
| 208 | --- a/.jenkins/multigpu.groovy |
| 209 | +++ /dev/null |
| 210 | @@ -1,96 +0,0 @@ |
| 211 | -#!/usr/bin/env groovy |
| 212 | -@Library('rocJenkins@pong') _ |
| 213 | - |
| 214 | -import com.amd.project.* |
| 215 | -import com.amd.docker.* |
| 216 | -import java.nio.file.Path |
| 217 | - |
| 218 | -def runCI = |
| 219 | -{ |
| 220 | - nodeDetails, jobName, buildCommand, label, runTest -> |
| 221 | - |
| 222 | - def prj = new rocProject('hipFFT', 'multigpu') |
| 223 | - // customize for project |
| 224 | - prj.paths.build_command = buildCommand |
| 225 | - prj.libraryDependencies = ['rocRAND', 'rocFFT', 'hipRAND'] |
| 226 | - prj.timeout.test = 360 |
| 227 | - |
| 228 | - // Define test architectures, optional rocm version argument is available |
| 229 | - def nodes = new dockerNodes(nodeDetails, jobName, prj) |
| 230 | - |
| 231 | - def commonGroovy |
| 232 | - |
| 233 | - boolean formatCheck = false |
| 234 | - |
| 235 | - def compileCommand = |
| 236 | - { |
| 237 | - platform, project-> |
| 238 | - |
| 239 | - project.paths.construct_build_prefix() |
| 240 | - |
| 241 | - commonGroovy = load "${project.paths.project_src_prefix}/.jenkins/common.groovy" |
| 242 | - commonGroovy.runCompileCommand(platform, project,jobName) |
| 243 | - } |
| 244 | - |
| 245 | - def testCommand = |
| 246 | - { |
| 247 | - platform, project-> |
| 248 | - |
| 249 | - def gfilter = "*multi_gpu*" |
| 250 | - commonGroovy.runTestCommand(platform, project, gfilter) |
| 251 | - } |
| 252 | - |
| 253 | - def packageCommand = |
| 254 | - { |
| 255 | - platform, project-> |
| 256 | - |
| 257 | - commonGroovy.runPackageCommand(platform, project, jobName, label) |
| 258 | - } |
| 259 | - |
| 260 | - buildProject(prj, formatCheck, nodes.dockerArray, compileCommand, runTest ? testCommand : null, packageCommand) |
| 261 | -} |
| 262 | - |
| 263 | -def setupCI(urlJobName, jobNameList, buildCommand, runCI, label, runTest) |
| 264 | -{ |
| 265 | - jobNameList = auxiliary.appendJobNameList(jobNameList) |
| 266 | - |
| 267 | - jobNameList.each |
| 268 | - { |
| 269 | - jobName, nodeDetails-> |
| 270 | - if (urlJobName == jobName) |
| 271 | - stage(label + ' ' + jobName) { |
| 272 | - runCI(nodeDetails, jobName, buildCommand, label, runTest) |
| 273 | - } |
| 274 | - } |
| 275 | - |
| 276 | - // For url job names that are not listed by the jobNameList i.e. compute-rocm-dkms-no-npi-1901 |
| 277 | - if(!jobNameList.keySet().contains(urlJobName)) |
| 278 | - { |
| 279 | - properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * *')])])) |
| 280 | - stage(label + ' ' + urlJobName) { |
| 281 | - runCI([ubuntu18:['gfx906']], urlJobName, buildCommand, label) |
| 282 | - } |
| 283 | - } |
| 284 | -} |
| 285 | - |
| 286 | -ci: { |
| 287 | - String urlJobName = auxiliary.getTopJobName(env.BUILD_URL) |
| 288 | - |
| 289 | - def propertyList = ["main":[pipelineTriggers([cron('0 6 * * 0')])]] |
| 290 | - |
| 291 | - propertyList = auxiliary.appendPropertyList(propertyList) |
| 292 | - |
| 293 | - def jobNameList = ["main":([ubuntu20:['8gfx90a']])] |
| 294 | - jobNameList = auxiliary.appendJobNameList(jobNameList) |
| 295 | - |
| 296 | - propertyList.each |
| 297 | - { |
| 298 | - jobName, property-> |
| 299 | - if (urlJobName == jobName) |
| 300 | - properties(auxiliary.addCommonProperties(property)) |
| 301 | - } |
| 302 | - |
| 303 | - String hipClangBuildCommand = '-DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CLIENTS_TESTS=ON -DBUILD_CLIENTS_SAMPLES=ON -L ../..' |
| 304 | - |
| 305 | - setupCI(urlJobName, jobNameList, hipClangBuildCommand, runCI, 'hip-clang', true) |
| 306 | -} |
| 307 | diff --git a/.jenkins/staticanalysis.groovy b/.jenkins/staticanalysis.groovy |
| 308 | deleted file mode 100644 |
| 309 | index dd54dde..0000000 |
| 310 | --- a/.jenkins/staticanalysis.groovy |
| 311 | +++ /dev/null |
| 312 | @@ -1,32 +0,0 @@ |
| 313 | -#!/usr/bin/env groovy |
| 314 | -@Library('rocJenkins@pong') _ |
| 315 | - |
| 316 | -import com.amd.project.* |
| 317 | -import com.amd.docker.* |
| 318 | -import java.nio.file.Path |
| 319 | - |
| 320 | -def runCI = |
| 321 | -{ |
| 322 | - nodeDetails, jobName -> |
| 323 | - |
| 324 | - def prj = new rocProject('hipFFT-internal', 'PreCheckin') |
| 325 | - // customize for project |
| 326 | - prj.libraryDependencies = ['rocRAND','rocFFT'] |
| 327 | - |
| 328 | - // Define test architectures, optional rocm version argument is available |
| 329 | - def nodes = new dockerNodes(nodeDetails, jobName, prj) |
| 330 | - |
| 331 | - boolean formatCheck = true |
| 332 | - boolean staticAnalysis = true |
| 333 | - |
| 334 | - buildProject(prj, formatCheck, nodes.dockerArray, null, null, null, staticAnalysis) |
| 335 | -} |
| 336 | - |
| 337 | -ci: { |
| 338 | - String urlJobName = auxiliary.getTopJobName(env.BUILD_URL) |
| 339 | - |
| 340 | - properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * 7')])])) |
| 341 | - stage(urlJobName) { |
| 342 | - runCI([ubuntu22:['any']], urlJobName) |
| 343 | - } |
| 344 | -} |
| 345 | diff --git a/.jenkins/staticlibrary.groovy b/.jenkins/staticlibrary.groovy |
| 346 | deleted file mode 100644 |
| 347 | index 1e99ac3..0000000 |
| 348 | --- a/.jenkins/staticlibrary.groovy |
| 349 | +++ /dev/null |
| 350 | @@ -1,98 +0,0 @@ |
| 351 | -#!/usr/bin/env groovy |
| 352 | -@Library('rocJenkins@pong') _ |
| 353 | - |
| 354 | -import com.amd.project.* |
| 355 | -import com.amd.docker.* |
| 356 | -import java.nio.file.Path |
| 357 | - |
| 358 | -def runCI = |
| 359 | -{ |
| 360 | - nodeDetails, jobName, buildCommand, label -> |
| 361 | - |
| 362 | - def prj = new rocProject('hipFFT-internal', 'StaticLibrary') |
| 363 | - // customize for project |
| 364 | - prj.paths.build_command = buildCommand |
| 365 | - prj.libraryDependencies = ['rocRAND','rocFFT'] |
| 366 | - |
| 367 | - // Define test architectures, optional rocm version argument is available |
| 368 | - def nodes = new dockerNodes(nodeDetails, jobName, prj) |
| 369 | - |
| 370 | - def commonGroovy |
| 371 | - |
| 372 | - boolean formatCheck = false |
| 373 | - |
| 374 | - def compileCommand = |
| 375 | - { |
| 376 | - platform, project-> |
| 377 | - |
| 378 | - project.paths.construct_build_prefix() |
| 379 | - |
| 380 | - commonGroovy = load "${project.paths.project_src_prefix}/.jenkins/common.groovy" |
| 381 | - commonGroovy.runCompileCommand(platform, project, jobName, true) |
| 382 | - } |
| 383 | - |
| 384 | - def testCommand = |
| 385 | - { |
| 386 | - platform, project-> |
| 387 | - |
| 388 | - def gfilter = "*" |
| 389 | - commonGroovy.runTestCommand(platform, project, gfilter) |
| 390 | - } |
| 391 | - |
| 392 | - def packageCommand = |
| 393 | - { |
| 394 | - platform, project-> |
| 395 | - |
| 396 | - commonGroovy.runPackageCommand(platform, project, jobName, label) |
| 397 | - } |
| 398 | - |
| 399 | - buildProject(prj, formatCheck, nodes.dockerArray, compileCommand, testCommand, packageCommand) |
| 400 | -} |
| 401 | - |
| 402 | -def setupCI(urlJobName, jobNameList, buildCommand, runCI, label) |
| 403 | -{ |
| 404 | - jobNameList = auxiliary.appendJobNameList(jobNameList) |
| 405 | - |
| 406 | - jobNameList.each |
| 407 | - { |
| 408 | - jobName, nodeDetails-> |
| 409 | - if (urlJobName == jobName) |
| 410 | - stage(label + ' ' + jobName) { |
| 411 | - runCI(nodeDetails, jobName, buildCommand, label) |
| 412 | - } |
| 413 | - } |
| 414 | - |
| 415 | - // For url job names that are not listed by the jobNameList i.e. compute-rocm-dkms-no-npi-1901 |
| 416 | - if(!jobNameList.keySet().contains(urlJobName)) |
| 417 | - { |
| 418 | - properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * *')])])) |
| 419 | - stage(label + ' ' + urlJobName) { |
| 420 | - runCI([ubuntu16:['gfx906']], urlJobName, buildCommand, label) |
| 421 | - } |
| 422 | - } |
| 423 | -} |
| 424 | - |
| 425 | -ci: { |
| 426 | - String urlJobName = auxiliary.getTopJobName(env.BUILD_URL) |
| 427 | - |
| 428 | - def propertyList = ["compute-rocm-dkms-no-npi":[pipelineTriggers([cron('0 1 * * 0')])]] |
| 429 | - |
| 430 | - propertyList = auxiliary.appendPropertyList(propertyList) |
| 431 | - |
| 432 | - propertyList.each |
| 433 | - { |
| 434 | - jobName, property-> |
| 435 | - if (urlJobName == jobName) |
| 436 | - properties(auxiliary.addCommonProperties(property)) |
| 437 | - } |
| 438 | - |
| 439 | - def hostJobNameList = ["compute-rocm-dkms-no-npi-hipclang":([ubuntu18:['gfx900']])] |
| 440 | - |
| 441 | - def hipClangJobNameList = ["compute-rocm-dkms-no-npi-hipclang":([ubuntu18:['gfx900']])] |
| 442 | - |
| 443 | - String hostBuildCommand = '-DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_SHARED_LIBS=OFF -L ../..' |
| 444 | - String hipClangBuildCommand = '-DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CLIENTS_TESTS=ON -DBUILD_CLIENTS_SAMPLES=ON -DBUILD_SHARED_LIBS=OFF -L ../..' |
| 445 | - |
| 446 | - setupCI(urlJobName, hostJobNameList, hostBuildCommand, runCI, 'g++') |
| 447 | - setupCI(urlJobName, hipClangJobNameList, hipClangBuildCommand, runCI, 'hip-clang') |
| 448 | -} |
| 449 | diff --git a/CHANGELOG.md b/CHANGELOG.md |
| 450 | index d9c3dd8..257c7a5 100644 |
| 451 | --- a/CHANGELOG.md |
| 452 | +++ b/CHANGELOG.md |
| 453 | @@ -3,6 +3,26 @@ |
| 454 | Documentation for hipFFT is available at |
| 455 | [https://rocm.docs.amd.com/projects/hipFFT/en/latest/](https://rocm.docs.amd.com/projects/hipFFT/en/latest/). |
| 456 | |
| 457 | +## hipFFT 1.0.21 for ROCm 7.1.0 |
| 458 | + |
| 459 | +### Added |
| 460 | + |
| 461 | +* Improved test coverage of multi-stream plans. |
| 462 | +* Improved test coverage of user-specified work areas. |
| 463 | +* Improved test coverage of default stride calculation. |
| 464 | +* **[Experimental]** Introduced the hipFFTW library, interfacing rocFFT on AMD platforms using the same symbols as FFTW3 (with partial support). |
| 465 | + |
| 466 | +## hipFFT 1.0.20 for ROCm 7.0.0 |
| 467 | + |
| 468 | +### Added |
| 469 | + |
| 470 | +* Added gfx950 support. |
| 471 | + |
| 472 | +### Removed |
| 473 | + |
| 474 | +* Removed hipfft-rider legacy compatibility from clients |
| 475 | +* Remove support for the gfx940 and gfx941 targets from the client programs. |
| 476 | + |
| 477 | ## hipFFT 1.0.18 for ROCm 6.4.0 |
| 478 | |
| 479 | ### Added |
| 480 | @@ -10,8 +30,7 @@ Documentation for hipFFT is available at |
| 481 | * Implemented the `hipfftMpAttachComm`, `hipfftXtSetDistribution`, and `hipfftXtSetSubformatDefault` APIs to allow |
| 482 | computing FFTs that are distributed between multiple MPI (Message Passing Interface) processes. These APIs can be enabled |
| 483 | with the `HIPFFT_MPI_ENABLE` CMake option, which defaults to `OFF`. |
| 484 | - |
| 485 | - The backend FFT library called by hipFFT must support MPI for these APIs to work. |
| 486 | +* The backend FFT library called by hipFFT must support MPI for these APIs to work. |
| 487 | |
| 488 | ### Changed |
| 489 | |
| 490 | diff --git a/CMakeLists.txt b/CMakeLists.txt |
| 491 | index 7f81897..8f802ac 100644 |
| 492 | --- a/CMakeLists.txt |
| 493 | +++ b/CMakeLists.txt |
| 494 | @@ -1,5 +1,5 @@ |
| 495 | # ############################################################################# |
| 496 | -# Copyright (C) 2020 - 2022 Advanced Micro Devices, Inc. All rights reserved. |
| 497 | +# Copyright (C) 2020 - 2025 Advanced Micro Devices, Inc. All rights reserved. |
| 498 | # |
| 499 | # Permission is hereby granted, free of charge, to any person obtaining a copy |
| 500 | # of this software and associated documentation files (the "Software"), to deal |
| 501 | @@ -37,7 +37,6 @@ else() |
| 502 | "Install path prefix, prepended onto install directories" ) |
| 503 | endif() |
| 504 | |
| 505 | - |
| 506 | # Workarounds.. |
| 507 | list( APPEND CMAKE_PREFIX_PATH /opt/rocm/llvm /opt/rocm ) |
| 508 | list( APPEND CMAKE_MODULE_PATH ${ROCM_PATH}/lib/cmake/hip /opt/rocm/lib/cmake/hip /opt/rocm/hip/cmake ) |
| 509 | @@ -83,9 +82,8 @@ set(DEFAULT_GPUS |
| 510 | gfx906 |
| 511 | gfx908 |
| 512 | gfx90a |
| 513 | - gfx940 |
| 514 | - gfx941 |
| 515 | gfx942 |
| 516 | + gfx950 |
| 517 | gfx1030 |
| 518 | gfx1100 |
| 519 | gfx1101 |
| 520 | @@ -101,8 +99,6 @@ if(BUILD_ADDRESS_SANITIZER) |
| 521 | SET(DEFAULT_GPUS |
| 522 | gfx908:xnack+ |
| 523 | gfx90a:xnack+ |
| 524 | - gfx940:xnack+ |
| 525 | - gfx941:xnack+ |
| 526 | gfx942:xnack+) |
| 527 | add_link_options(-fuse-ld=lld) |
| 528 | add_compile_definitions(ADDRESS_SANITIZER) |
| 529 | @@ -167,14 +163,8 @@ if (BUILD_WITH_COMPILER STREQUAL "HIP-NVCC" ) |
| 530 | set( WARNING_FLAGS ${NVCC_WARNING_FLAGS} ) |
| 531 | |
| 532 | else() |
| 533 | - # Define GPU targets |
| 534 | - if(AMDGPU_TARGETS AND NOT GPU_TARGETS) |
| 535 | - message( DEPRECATION "AMDGPU_TARGETS use is deprecated. Use GPU_TARGETS." ) |
| 536 | - endif() |
| 537 | - set(AMDGPU_TARGETS "${DEFAULT_GPUS}" CACHE STRING "Target default GPUs if AMDGPU_TARGETS is not defined. (Deprecated, prefer GPU_TARGETS)") |
| 538 | - rocm_check_target_ids(AMDGPU_TARGETS TARGETS "${AMDGPU_TARGETS}") |
| 539 | - # Don't force, users should be able to override GPU_TARGETS at the command line if desired |
| 540 | - set(GPU_TARGETS "${AMDGPU_TARGETS}" CACHE STRING "GPU architectures to build for") |
| 541 | + # Check support for the GPU target(s) |
| 542 | + rocm_check_target_ids(GPU_TARGETS TARGETS "${GPU_TARGETS}") |
| 543 | if( BUILD_WITH_COMPILER STREQUAL "HIP-CLANG" ) |
| 544 | set( HIP_PLATFORM "amd" ) |
| 545 | set( HIP_COMPILER "clang" ) |
| 546 | @@ -184,22 +174,11 @@ endif() |
| 547 | # Show the actual compiler(internal option) |
| 548 | message(STATUS "BUILD_WITH_COMPILER = " ${BUILD_WITH_COMPILER}) |
| 549 | |
| 550 | -# FOR HANDLING ENABLE/DISABLE OPTIONAL BACKWARD COMPATIBILITY for FILE/FOLDER REORG |
| 551 | -option(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY "Build with file/folder reorg with backward compatibility enabled" OFF) |
| 552 | -if(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY AND NOT WIN32) |
| 553 | - rocm_wrap_header_dir( |
| 554 | - ${CMAKE_SOURCE_DIR}/library/include |
| 555 | - PATTERNS "*.h" |
| 556 | - GUARDS SYMLINK WRAPPER |
| 557 | - WRAPPER_LOCATIONS ${CMAKE_INSTALL_INCLUDEDIR} |
| 558 | - ) |
| 559 | -endif() |
| 560 | - |
| 561 | # Version |
| 562 | -set( VERSION_STRING "1.0.18" ) |
| 563 | +set( VERSION_STRING "1.0.21" ) |
| 564 | set( hipfft_SOVERSION 0.1 ) |
| 565 | |
| 566 | -if( ROCM_FOUND ) |
| 567 | +if( ROCmCMakeBuildTools_FOUND ) |
| 568 | rocm_setup_version( VERSION ${VERSION_STRING} ) |
| 569 | endif() |
| 570 | |
| 571 | @@ -262,7 +241,9 @@ endif() |
| 572 | if(WIN32) |
| 573 | set(CPACK_SOURCE_GENERATOR "ZIP") |
| 574 | set(CPACK_GENERATOR "ZIP") |
| 575 | - set(CMAKE_INSTALL_PREFIX "C:/hipSDK" CACHE PATH "Install path" FORCE) |
| 576 | + if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT) |
| 577 | + set(CMAKE_INSTALL_PREFIX "C:/hipSDK" CACHE PATH "Install path" FORCE) |
| 578 | + endif() |
| 579 | set(INSTALL_PREFIX "C:/hipSDK") |
| 580 | set(CPACK_SET_DESTDIR OFF) |
| 581 | set(CPACK_PACKAGE_INSTALL_DIRECTORY "C:/hipSDK") |
| 582 | @@ -270,7 +251,7 @@ if(WIN32) |
| 583 | set(CPACK_INCLUDE_TOPLEVEL_DIRECTORY OFF) |
| 584 | endif() |
| 585 | |
| 586 | -if( ROCM_FOUND ) |
| 587 | +if( ROCmCMakeBuildTools_FOUND ) |
| 588 | # Package specific CPACK vars |
| 589 | if( NOT BUILD_WITH_LIB STREQUAL "CUDA" ) |
| 590 | rocm_package_add_dependencies(DEPENDS "rocfft >= 1.0.21") |
| 591 | diff --git a/LICENSE.md b/LICENSE.md |
| 592 | index f081475..ee33c80 100644 |
| 593 | --- a/LICENSE.md |
| 594 | +++ b/LICENSE.md |
| 595 | @@ -1,6 +1,6 @@ |
| 596 | MIT License |
| 597 | |
| 598 | -Copyright (C) 2016 - 2025 Advanced Micro Devices, Inc. All rights reserved. |
| 599 | +Copyright (C) Advanced Micro Devices, Inc. |
| 600 | |
| 601 | Permission is hereby granted, free of charge, to any person obtaining a copy |
| 602 | of this software and associated documentation files (the "Software"), to deal |
| 603 | @@ -20,6 +20,8 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| 604 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
| 605 | SOFTWARE. |
| 606 | |
| 607 | +--- |
| 608 | + |
| 609 | This product includes software from copyright holders as shown below, and distributed under their license terms as specified. |
| 610 | |
| 611 | CLI11 2.2 Copyright (c) 2017-2024 University of Cincinnati, developed by Henry |
| 612 | diff --git a/README.md b/README.md |
| 613 | index d47fd12..5961742 100644 |
| 614 | --- a/README.md |
| 615 | +++ b/README.md |
| 616 | @@ -1,7 +1,7 @@ |
| 617 | # hipFFT |
| 618 | |
| 619 | hipFFT is an FFT marshalling library that supports |
| 620 | -[rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT) and |
| 621 | +[rocFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocfft) and |
| 622 | [cuFFT](https://developer.nvidia.com/cufft) backends. |
| 623 | |
| 624 | hipFFT exports an interface that doesn't require the client to change, regardless of the chosen backend. |
| 625 | @@ -11,12 +11,12 @@ and marshals results back to your application. |
| 626 | ## Documentation |
| 627 | |
| 628 | > [!NOTE] |
| 629 | -> The published hipFFT documentation is available at [hipFFT](https://rocm.docs.amd.com/projects/hipFFT/en/latest/index.html) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the hipFFT/docs folder of this repository. As with all ROCm projects, the documentation is open source. For more information, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html). |
| 630 | +> The published hipFFT documentation is available at [hipFFT](https://rocm.docs.amd.com/projects/hipFFT/en/latest/index.html) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the projects/hipfft/docs folder of the rocm-libraries repository. As with all ROCm projects, the documentation is open source. For more information, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html). |
| 631 | |
| 632 | To build our documentation locally, run the following code: |
| 633 | |
| 634 | ```bash |
| 635 | -cd docs |
| 636 | +cd projects/hipfft/docs |
| 637 | |
| 638 | pip3 install -r sphinx/requirements.txt |
| 639 | |
| 640 | @@ -36,14 +36,13 @@ To build hipFFT from source, follow these steps: |
| 641 | |
| 642 | 1. Install the library build dependencies: |
| 643 | |
| 644 | - * On AMD platforms, you must install [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT). |
| 645 | - * On NVIDIA platforms, you must install [cuFFT](https://developer.nvidia.com/cufft). |
| 646 | + * On AMD platforms, you must install [rocFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocfft). |
| 647 | |
| 648 | 2. Install the client build dependencies: |
| 649 | |
| 650 | * The clients (samples, tests, etc) included with the hipFFT source depend on hipRAND, FFTW and GoogleTest. |
| 651 | |
| 652 | -3. Build hipFFT: |
| 653 | +3. Build hipFFT. Run these commands from the `rocm-libraries/projects/hipfft` directory: |
| 654 | |
| 655 | To show all build options: |
| 656 | |
| 657 | @@ -52,40 +51,42 @@ To build hipFFT from source, follow these steps: |
| 658 | cmake -LH .. |
| 659 | ``` |
| 660 | |
| 661 | -Here are some CMake build examples: |
| 662 | +Here are some CMake build examples for an AMD GPU: |
| 663 | |
| 664 | -* AMD GPU |
| 665 | - * Case: Build a project using HIP language APIs + hipFFT with standard host compiler |
| 666 | - * Code: `cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -L ..` |
| 667 | - * Case: Build a project using HIP language APIs + hipFFT + device kernels with HIP-Clang |
| 668 | - * Code: `cmake -DCMAKE_CXX_COMPILER=amdclang++ -DCMAKE_BUILD_TYPE=Release -DBUILD_CLIENTS=ON -L ..` |
| 669 | -* NVIDIA GPU |
| 670 | - * Case: Build a project using HIP language APIs + hipFFT with standard host compiler |
| 671 | - * Code: `cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -DBUILD_WITH_LIB=CUDA -L ..` |
| 672 | - * Case: Build a project using HIP language APIs + hipFFT + device kernels with HIP-NVCC |
| 673 | - * Code: `HIP_PLATFORM=nvidia cmake -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_BUILD_TYPE=Release -DBUILD_CLIENTS=ON -L ..` |
| 674 | +* Case: Build a project using HIP language APIs + hipFFT with standard host compiler |
| 675 | + * Code: `cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -L ..` |
| 676 | +* Case: Build a project using HIP language APIs + hipFFT + device kernels with HIP-Clang |
| 677 | + * Code: `cmake -DCMAKE_CXX_COMPILER=amdclang++ -DCMAKE_BUILD_TYPE=Release -DBUILD_CLIENTS=ON -L ..` |
| 678 | |
| 679 | ```note |
| 680 | The `-DBUILD_CLIENTS=ON` option is only allowed with the amdclang++ or HIPCC compilers. |
| 681 | ``` |
| 682 | |
| 683 | +## Code Coverage |
| 684 | +You can generate a test coverage report with the following: |
| 685 | + |
| 686 | +```bash |
| 687 | +cmake -DCMAKE_CXX_COMPILER=amdclang++ -DBUILD_CLIENTS_SAMPLES=ON -DBUILD_CLIENTS_TESTS=ON -DBUILD_CODE_COVERAGE=ON <optional: -DCOVERAGE_TEST_OPTIONS="cmdline args to pass to hipfft-test (default: --smoketest)"> .. |
| 688 | +make -j coverage |
| 689 | +``` |
| 690 | +The commands above will output the coverage report to the terminal and save an html coverage report to `$PWD/coverage-report`. Note that hipFFT uses llvm for code coverage, which only works with clang compilers. |
| 691 | + |
| 692 | ## Porting from CUDA |
| 693 | |
| 694 | If you have existing CUDA code and want to transition to HIP, follow these steps: |
| 695 | |
| 696 | 1. [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY) your code and fix all unsupported CUDA |
| 697 | features and user-defined macros |
| 698 | -2. Build with HIP-NVCC to run on an NVIDIA device |
| 699 | -3. Build with HIP-Clang to run on an AMD device |
| 700 | +2. Build with HIP-Clang to run on an AMD device |
| 701 | |
| 702 | More information about porting to HIP is available in the |
| 703 | [HIP porting guide](https://rocm.docs.amd.com/projects/HIP/en/develop/user_guide/hip_porting_guide.html). |
| 704 | |
| 705 | ## Support |
| 706 | |
| 707 | -You can report bugs and feature requests through the GitHub |
| 708 | -[issue tracker](https://github.com/ROCm/hipFFT/issues). |
| 709 | +You can report bugs and feature requests through the rocm-libraries GitHub |
| 710 | +[issue tracker](https://github.com/ROCm/rocm-libraries/issues). |
| 711 | |
| 712 | ## Contribute |
| 713 | |
| 714 | -If you want to contribute to hipFFT, you must follow our [contribution guidelines](https://github.com/ROCm/hipFFT/blob/develop/.github/CONTRIBUTING.md). |
| 715 | +If you want to contribute to hipFFT, you must follow our [contribution guidelines](https://github.com/ROCm/rocm-libraries/blob/develop/projects/hipfft/.github/CONTRIBUTING.md). |
| 716 | diff --git a/clients/CMakeLists.txt b/clients/CMakeLists.txt |
| 717 | index 739ef03..67de6e8 100644 |
| 718 | --- a/clients/CMakeLists.txt |
| 719 | +++ b/clients/CMakeLists.txt |
| 720 | @@ -88,9 +88,9 @@ if( BUILD_CLIENTS_SAMPLES ) |
| 721 | endif( ) |
| 722 | |
| 723 | if( BUILD_CLIENTS_TESTS ) |
| 724 | - find_package( GTest 1.11.0 ) |
| 725 | + find_package( GTest QUIET ) |
| 726 | include( ExternalProject ) |
| 727 | - if( NOT GTEST_FOUND ) |
| 728 | + if( NOT GTest_FOUND ) |
| 729 | set( GTEST_INCLUDE_DIRS |
| 730 | ${CMAKE_CURRENT_BINARY_DIR}/src/gtest/googletest/include |
| 731 | ) |
| 732 | diff --git a/clients/bench/CMakeLists.txt b/clients/bench/CMakeLists.txt |
| 733 | index c6da335..4cece34 100644 |
| 734 | --- a/clients/bench/CMakeLists.txt |
| 735 | +++ b/clients/bench/CMakeLists.txt |
| 736 | @@ -93,19 +93,3 @@ set_target_properties( hipfft-bench |
| 737 | ${BENCH_OUT_DIR} ) |
| 738 | |
| 739 | rocm_install(TARGETS hipfft-bench COMPONENT benchmarks) |
| 740 | - |
| 741 | -# install compatibility for old name of bench program - symlink on |
| 742 | -# unix, hardlink on windows (since privilege is required to create |
| 743 | -# symlinks there) |
| 744 | -if( WIN32 ) |
| 745 | - set( BENCH_LINK_COMMAND create_hardlink ) |
| 746 | - set( BENCH_NEW_NAME ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/hipfft-bench${CMAKE_EXECUTABLE_SUFFIX} ) |
| 747 | - set( BENCH_OLD_NAME ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/hipfft-rider${CMAKE_EXECUTABLE_SUFFIX} ) |
| 748 | -else() |
| 749 | - set( BENCH_LINK_COMMAND create_symlink ) |
| 750 | - set( BENCH_NEW_NAME hipfft-bench ) |
| 751 | - set( BENCH_OLD_NAME ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/hipfft-rider ) |
| 752 | -endif() |
| 753 | -install( |
| 754 | - CODE "execute_process( COMMAND \"${CMAKE_COMMAND}\" -E ${BENCH_LINK_COMMAND} \"${BENCH_NEW_NAME}\" \"${BENCH_OLD_NAME}\" )" |
| 755 | -) |
| 756 | diff --git a/clients/bench/bench.cpp b/clients/bench/bench.cpp |
| 757 | index c9cfed9..52cf888 100644 |
| 758 | --- a/clients/bench/bench.cpp |
| 759 | +++ b/clients/bench/bench.cpp |
| 760 | @@ -31,6 +31,9 @@ |
| 761 | #include "../../shared/client_except.h" |
| 762 | #include "../../shared/gpubuf.h" |
| 763 | |
| 764 | +// initialize static class member of hipfft_params |
| 765 | +std::vector<gpubuf> hipfft_params::externally_managed_workareas = std::vector<gpubuf>(); |
| 766 | + |
| 767 | int main(int argc, char* argv[]) |
| 768 | { |
| 769 | // This helps with mixing output of both wide and narrow characters to the screen |
| 770 | @@ -79,6 +82,11 @@ int main(int argc, char* argv[]) |
| 771 | "forward\n3) real inverse") |
| 772 | ->default_val(fft_transform_type_complex_forward); |
| 773 | non_token |
| 774 | + ->add_option("--auto_allocation", |
| 775 | + params.auto_allocate, |
| 776 | + "hipFFT's auto-allocation behavior: \"on\", \"off\", or \"default\"") |
| 777 | + ->default_val("default"); |
| 778 | + non_token |
| 779 | ->add_option( |
| 780 | "--precision", params.precision, "Transform precision: single (default), double, half") |
| 781 | ->excludes("--double"); |
| 782 | diff --git a/clients/hipfft_params.h b/clients/hipfft_params.h |
| 783 | index e0fc576..f5bcdb4 100644 |
| 784 | --- a/clients/hipfft_params.h |
| 785 | +++ b/clients/hipfft_params.h |
| 786 | @@ -32,12 +32,28 @@ |
| 787 | #include "../shared/hipfft_brick.h" |
| 788 | #include "hipfft/hipfft.h" |
| 789 | #include "hipfft/hipfftXt.h" |
| 790 | +#include <random> |
| 791 | |
| 792 | #ifdef HIPFFT_MPI_ENABLE |
| 793 | #include "hipfft/hipfftMp.h" |
| 794 | #include <mpi.h> |
| 795 | #endif |
| 796 | |
| 797 | +template <typename T, |
| 798 | + typename... Args, |
| 799 | + std::enable_if_t<std::is_integral_v<T> && (std::is_same_v<T, Args> && ...), bool> = true> |
| 800 | +static void set_with_random_nonnegative_values(const std::string& token, T& val, Args&... args) |
| 801 | +{ |
| 802 | + // using a hash of the token as random seed to avoid |
| 803 | + // dependencies on externally-defined variables |
| 804 | + std::hash<std::string> hasher; |
| 805 | + std::ranlux24_base gen(hasher(token)); |
| 806 | + std::uniform_int_distribution<T> dis(static_cast<T>(0), std::numeric_limits<T>::max()); |
| 807 | + val = dis(gen); |
| 808 | + ((args = dis(gen)), ...); |
| 809 | + return; |
| 810 | +} |
| 811 | + |
| 812 | inline fft_status fft_status_from_hipfftparams(const hipfftResult_t val) |
| 813 | { |
| 814 | switch(val) |
| 815 | @@ -141,6 +157,13 @@ public: |
| 816 | std::vector<long long int> ll_inembed; |
| 817 | std::vector<long long int> ll_onembed; |
| 818 | |
| 819 | + template <typename T> |
| 820 | + struct many_api_layout_args |
| 821 | + { |
| 822 | + T *input_embed, *output_embed; |
| 823 | + T input_stride, output_stride, input_distance, output_distance; |
| 824 | + }; |
| 825 | + |
| 826 | struct hipLibXtDesc_deleter |
| 827 | { |
| 828 | void operator()(hipLibXtDesc* d) |
| 829 | @@ -160,22 +183,46 @@ public: |
| 830 | |
| 831 | // backend library can write N worksize values for N GPUs, so |
| 832 | // allocate a vector for that if necessary |
| 833 | - std::vector<size_t> xt_worksize; |
| 834 | + std::vector<size_t> auto_allocated_worksizes; |
| 835 | + // if auto_allocate == fft_auto_allocation_off, the hipFFT plan(s) |
| 836 | + // will be provided with externally-managed work area(s): |
| 837 | + static std::vector<gpubuf> externally_managed_workareas; |
| 838 | |
| 839 | - // pointer we pass to the backend library. By default point to the |
| 840 | - // single-GPU workbuffer size. |
| 841 | - size_t* workbuffersize_ptr; |
| 842 | + size_t auto_allocated_extra_vram_footprint() const |
| 843 | + { |
| 844 | + return std::accumulate(auto_allocated_worksizes.begin(), |
| 845 | + auto_allocated_worksizes.end(), |
| 846 | + static_cast<size_t>(0)); |
| 847 | + } |
| 848 | + |
| 849 | + static size_t externally_managed_extra_vram_footprint() |
| 850 | + { |
| 851 | + return std::accumulate(externally_managed_workareas.begin(), |
| 852 | + externally_managed_workareas.end(), |
| 853 | + static_cast<size_t>(0), |
| 854 | + [](size_t total, const gpubuf& buf) { return total + buf.size(); }); |
| 855 | + } |
| 856 | |
| 857 | - hipfft_params() |
| 858 | + bool is_preventing_auto_allocation_at_generation() const |
| 859 | { |
| 860 | - workbuffersize_ptr = &workbuffersize; |
| 861 | + if(auto_allocate != fft_auto_allocation_off) |
| 862 | + return false; |
| 863 | + // Let hipFFT sometimes auto-allocate nonetheless so that tests cover its |
| 864 | + // ability to free resources (allocated at generation) when/if some |
| 865 | + // externally-managed workarea(s) are provided after plan generation |
| 866 | + // Note: this member function must return the same result even if called |
| 867 | + // more than once by a given instance, it must be stable for any instance |
| 868 | + return std::hash<std::string>()(token()) % 2 == 1; |
| 869 | } |
| 870 | |
| 871 | + hipfft_params() = default; |
| 872 | + |
| 873 | hipfft_params(const fft_params& p) |
| 874 | : fft_params(p) |
| 875 | { |
| 876 | - workbuffersize_ptr = &workbuffersize; |
| 877 | } |
| 878 | + hipfft_params(hipfft_params&& p) = default; |
| 879 | + hipfft_params& operator=(hipfft_params&& other) = default; |
| 880 | |
| 881 | ~hipfft_params() |
| 882 | { |
| 883 | @@ -206,12 +253,14 @@ public: |
| 884 | } |
| 885 | catch(fft_params::work_buffer_alloc_failure& e) |
| 886 | { |
| 887 | - val += workbuffersize; |
| 888 | + val += auto_allocated_extra_vram_footprint(); |
| 889 | + val += externally_managed_extra_vram_footprint(); |
| 890 | std::stringstream msg; |
| 891 | msg << "Plan work buffer size (" << val << " bytes raw data) too large for device"; |
| 892 | throw ROCFFT_SKIP{msg.str()}; |
| 893 | } |
| 894 | - val += workbuffersize; |
| 895 | + val += auto_allocated_extra_vram_footprint(); |
| 896 | + val += externally_managed_extra_vram_footprint(); |
| 897 | return val; |
| 898 | } |
| 899 | |
| 900 | @@ -336,6 +385,11 @@ public: |
| 901 | int_inembed[i] = ll_inembed[i]; |
| 902 | int_onembed[i] = ll_onembed[i]; |
| 903 | } |
| 904 | + // reset auto_allocated_worksizes |
| 905 | + auto_allocated_worksizes.resize(get_num_used_gpus()); |
| 906 | + std::for_each(auto_allocated_worksizes.begin(), |
| 907 | + auto_allocated_worksizes.end(), |
| 908 | + [](decltype(auto_allocated_worksizes)::value_type& val) { val = 0; }); |
| 909 | |
| 910 | hipfftResult ret = HIPFFT_SUCCESS; |
| 911 | return fft_status_from_hipfftparams(ret); |
| 912 | @@ -402,19 +456,47 @@ public: |
| 913 | } |
| 914 | } |
| 915 | |
| 916 | + if(ret == HIPFFT_SUCCESS && auto_allocate == fft_auto_allocation_off) |
| 917 | + { |
| 918 | + ret = set_externally_managed_work_areas(); |
| 919 | + } |
| 920 | + |
| 921 | // hipFFT can fail plan creation due to allocation failure - |
| 922 | // tests are expecting a specific exception in that case, |
| 923 | // because the test was unable to run. Doesn't mean the test |
| 924 | // case failed. |
| 925 | if(ret == HIPFFT_ALLOC_FAILED) |
| 926 | - throw fft_params::work_buffer_alloc_failure( |
| 927 | - "plan create failed due to allocation failure"); |
| 928 | + { |
| 929 | + if(!final_attempt_at_plan_creation && externally_managed_extra_vram_footprint() > 0) |
| 930 | + { |
| 931 | + final_attempt_at_plan_creation = true; |
| 932 | + // device allocation(s) in externally_managed_workareas might be |
| 933 | + // larger than needed or even unnecessary for the instance of interest. |
| 934 | + // Free them up and try again before concluding. |
| 935 | + externally_managed_workareas.clear(); |
| 936 | + return create_plan(); |
| 937 | + } |
| 938 | + else |
| 939 | + { |
| 940 | + throw fft_params::work_buffer_alloc_failure( |
| 941 | + "plan create failed due to allocation failure", |
| 942 | + externally_managed_extra_vram_footprint() |
| 943 | + + auto_allocated_extra_vram_footprint()); |
| 944 | + } |
| 945 | + } |
| 946 | |
| 947 | // store token to check if plan was already made |
| 948 | current_token = token(); |
| 949 | return fft_status_from_hipfftparams(ret); |
| 950 | } |
| 951 | |
| 952 | + hipfftResult_t set_stream(hipStream_t stream) |
| 953 | + { |
| 954 | + if(plan == INVALID_PLAN_HANDLE) |
| 955 | + throw std::runtime_error("Plan must be created before setting a desired stream"); |
| 956 | + return hipfftSetStream(plan, stream); |
| 957 | + } |
| 958 | + |
| 959 | void validate_fields() const override |
| 960 | { |
| 961 | validate_brick_volume(); |
| 962 | @@ -454,10 +536,12 @@ public: |
| 963 | } |
| 964 | } |
| 965 | |
| 966 | - fft_status set_callbacks(void* load_cb_host, |
| 967 | - void* load_cb_data, |
| 968 | - void* store_cb_host, |
| 969 | - void* store_cb_data) override |
| 970 | + fft_status set_callbacks(void* load_cb_host, |
| 971 | + void* load_cb_data, |
| 972 | + void* store_cb_host, |
| 973 | + void* store_cb_data, |
| 974 | + size_t load_cb_shared_mem_bytes = 0, |
| 975 | + size_t store_cb_shared_mem_bytes = 0) override |
| 976 | { |
| 977 | if(run_callbacks) |
| 978 | { |
| 979 | @@ -476,6 +560,16 @@ public: |
| 980 | plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX, &store_cb_data); |
| 981 | if(ret != HIPFFT_SUCCESS) |
| 982 | return fft_status_from_hipfftparams(ret); |
| 983 | + |
| 984 | + ret = hipfftXtSetCallbackSharedSize( |
| 985 | + plan, HIPFFT_CB_LD_REAL, load_cb_shared_mem_bytes); |
| 986 | + if(ret != HIPFFT_SUCCESS) |
| 987 | + return fft_status_from_hipfftparams(ret); |
| 988 | + |
| 989 | + ret = hipfftXtSetCallbackSharedSize( |
| 990 | + plan, HIPFFT_CB_ST_COMPLEX, store_cb_shared_mem_bytes); |
| 991 | + if(ret != HIPFFT_SUCCESS) |
| 992 | + return fft_status_from_hipfftparams(ret); |
| 993 | break; |
| 994 | case HIPFFT_D2Z: |
| 995 | ret = hipfftXtSetCallback( |
| 996 | @@ -487,6 +581,16 @@ public: |
| 997 | plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX_DOUBLE, &store_cb_data); |
| 998 | if(ret != HIPFFT_SUCCESS) |
| 999 | return fft_status_from_hipfftparams(ret); |
| 1000 | + |
| 1001 | + ret = hipfftXtSetCallbackSharedSize( |
| 1002 | + plan, HIPFFT_CB_LD_REAL_DOUBLE, load_cb_shared_mem_bytes); |
| 1003 | + if(ret != HIPFFT_SUCCESS) |
| 1004 | + return fft_status_from_hipfftparams(ret); |
| 1005 | + |
| 1006 | + ret = hipfftXtSetCallbackSharedSize( |
| 1007 | + plan, HIPFFT_CB_ST_COMPLEX_DOUBLE, store_cb_shared_mem_bytes); |
| 1008 | + if(ret != HIPFFT_SUCCESS) |
| 1009 | + return fft_status_from_hipfftparams(ret); |
| 1010 | break; |
| 1011 | case HIPFFT_C2R: |
| 1012 | ret = hipfftXtSetCallback(plan, &load_cb_host, HIPFFT_CB_LD_COMPLEX, &load_cb_data); |
| 1013 | @@ -496,6 +600,16 @@ public: |
| 1014 | ret = hipfftXtSetCallback(plan, &store_cb_host, HIPFFT_CB_ST_REAL, &store_cb_data); |
| 1015 | if(ret != HIPFFT_SUCCESS) |
| 1016 | return fft_status_from_hipfftparams(ret); |
| 1017 | + |
| 1018 | + ret = hipfftXtSetCallbackSharedSize( |
| 1019 | + plan, HIPFFT_CB_LD_COMPLEX, load_cb_shared_mem_bytes); |
| 1020 | + if(ret != HIPFFT_SUCCESS) |
| 1021 | + return fft_status_from_hipfftparams(ret); |
| 1022 | + |
| 1023 | + ret = hipfftXtSetCallbackSharedSize( |
| 1024 | + plan, HIPFFT_CB_ST_REAL, store_cb_shared_mem_bytes); |
| 1025 | + if(ret != HIPFFT_SUCCESS) |
| 1026 | + return fft_status_from_hipfftparams(ret); |
| 1027 | break; |
| 1028 | case HIPFFT_Z2D: |
| 1029 | ret = hipfftXtSetCallback( |
| 1030 | @@ -507,6 +621,16 @@ public: |
| 1031 | plan, &store_cb_host, HIPFFT_CB_ST_REAL_DOUBLE, &store_cb_data); |
| 1032 | if(ret != HIPFFT_SUCCESS) |
| 1033 | return fft_status_from_hipfftparams(ret); |
| 1034 | + |
| 1035 | + ret = hipfftXtSetCallbackSharedSize( |
| 1036 | + plan, HIPFFT_CB_LD_COMPLEX_DOUBLE, load_cb_shared_mem_bytes); |
| 1037 | + if(ret != HIPFFT_SUCCESS) |
| 1038 | + return fft_status_from_hipfftparams(ret); |
| 1039 | + |
| 1040 | + ret = hipfftXtSetCallbackSharedSize( |
| 1041 | + plan, HIPFFT_CB_ST_REAL_DOUBLE, store_cb_shared_mem_bytes); |
| 1042 | + if(ret != HIPFFT_SUCCESS) |
| 1043 | + return fft_status_from_hipfftparams(ret); |
| 1044 | break; |
| 1045 | case HIPFFT_C2C: |
| 1046 | ret = hipfftXtSetCallback(plan, &load_cb_host, HIPFFT_CB_LD_COMPLEX, &load_cb_data); |
| 1047 | @@ -517,6 +641,16 @@ public: |
| 1048 | plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX, &store_cb_data); |
| 1049 | if(ret != HIPFFT_SUCCESS) |
| 1050 | return fft_status_from_hipfftparams(ret); |
| 1051 | + |
| 1052 | + ret = hipfftXtSetCallbackSharedSize( |
| 1053 | + plan, HIPFFT_CB_LD_COMPLEX, load_cb_shared_mem_bytes); |
| 1054 | + if(ret != HIPFFT_SUCCESS) |
| 1055 | + return fft_status_from_hipfftparams(ret); |
| 1056 | + |
| 1057 | + ret = hipfftXtSetCallbackSharedSize( |
| 1058 | + plan, HIPFFT_CB_ST_COMPLEX, store_cb_shared_mem_bytes); |
| 1059 | + if(ret != HIPFFT_SUCCESS) |
| 1060 | + return fft_status_from_hipfftparams(ret); |
| 1061 | break; |
| 1062 | case HIPFFT_Z2Z: |
| 1063 | ret = hipfftXtSetCallback( |
| 1064 | @@ -528,6 +662,16 @@ public: |
| 1065 | plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX_DOUBLE, &store_cb_data); |
| 1066 | if(ret != HIPFFT_SUCCESS) |
| 1067 | return fft_status_from_hipfftparams(ret); |
| 1068 | + |
| 1069 | + ret = hipfftXtSetCallbackSharedSize( |
| 1070 | + plan, HIPFFT_CB_LD_COMPLEX_DOUBLE, load_cb_shared_mem_bytes); |
| 1071 | + if(ret != HIPFFT_SUCCESS) |
| 1072 | + return fft_status_from_hipfftparams(ret); |
| 1073 | + |
| 1074 | + ret = hipfftXtSetCallbackSharedSize( |
| 1075 | + plan, HIPFFT_CB_ST_COMPLEX_DOUBLE, store_cb_shared_mem_bytes); |
| 1076 | + if(ret != HIPFFT_SUCCESS) |
| 1077 | + return fft_status_from_hipfftparams(ret); |
| 1078 | break; |
| 1079 | default: |
| 1080 | throw std::runtime_error("Invalid execution type"); |
| 1081 | @@ -921,17 +1065,301 @@ private: |
| 1082 | CREATE_XT_MAKE_PLAN_MANY, |
| 1083 | }; |
| 1084 | |
| 1085 | + // check that worksize estimates can be successfully queried with or without a valid plan |
| 1086 | + hipfftResult_t check_worksize_estimate() |
| 1087 | + { |
| 1088 | + hipfftResult_t ret{HIPFFT_INTERNAL_ERROR}; |
| 1089 | + if(!hipfft_transform_type) |
| 1090 | + { |
| 1091 | + throw std::runtime_error("Estimating worksize requires a valid type of transform"); |
| 1092 | + } |
| 1093 | + std::vector<size_t> worksize_estimate(get_num_used_gpus(), absurd_init_worksize_estimate); |
| 1094 | + switch(get_create_type()) |
| 1095 | + { |
| 1096 | + case CREATE_MAKE_PLAN_Nd: |
| 1097 | + { |
| 1098 | + switch(dim()) |
| 1099 | + { |
| 1100 | + case 1: |
| 1101 | + if(plan == INVALID_PLAN_HANDLE) |
| 1102 | + ret = hipfftEstimate1d( |
| 1103 | + int_length[0], *hipfft_transform_type, nbatch, worksize_estimate.data()); |
| 1104 | + else |
| 1105 | + ret = hipfftGetSize1d(plan, |
| 1106 | + int_length[0], |
| 1107 | + *hipfft_transform_type, |
| 1108 | + nbatch, |
| 1109 | + worksize_estimate.data()); |
| 1110 | + break; |
| 1111 | + case 2: |
| 1112 | + if(plan == INVALID_PLAN_HANDLE) |
| 1113 | + ret = hipfftEstimate2d(int_length[0], |
| 1114 | + int_length[1], |
| 1115 | + *hipfft_transform_type, |
| 1116 | + worksize_estimate.data()); |
| 1117 | + else |
| 1118 | + ret = hipfftGetSize2d(plan, |
| 1119 | + int_length[0], |
| 1120 | + int_length[1], |
| 1121 | + *hipfft_transform_type, |
| 1122 | + worksize_estimate.data()); |
| 1123 | + break; |
| 1124 | + case 3: |
| 1125 | + if(plan == INVALID_PLAN_HANDLE) |
| 1126 | + ret = hipfftEstimate3d(int_length[0], |
| 1127 | + int_length[1], |
| 1128 | + int_length[2], |
| 1129 | + *hipfft_transform_type, |
| 1130 | + worksize_estimate.data()); |
| 1131 | + else |
| 1132 | + ret = hipfftGetSize3d(plan, |
| 1133 | + int_length[0], |
| 1134 | + int_length[1], |
| 1135 | + int_length[2], |
| 1136 | + *hipfft_transform_type, |
| 1137 | + worksize_estimate.data()); |
| 1138 | + break; |
| 1139 | + default: |
| 1140 | + throw std::runtime_error("invalid dim"); |
| 1141 | + } |
| 1142 | + break; |
| 1143 | + } |
| 1144 | + case CREATE_MAKE_PLAN_MANY: |
| 1145 | + { |
| 1146 | + auto layout_args = make_valid_layout_args_for_plan_many<int>(); |
| 1147 | + if(plan == INVALID_PLAN_HANDLE) |
| 1148 | + ret = hipfftEstimateMany(dim(), |
| 1149 | + int_length.data(), |
| 1150 | + layout_args.input_embed, |
| 1151 | + layout_args.input_stride, |
| 1152 | + layout_args.input_distance, |
| 1153 | + layout_args.output_embed, |
| 1154 | + layout_args.output_stride, |
| 1155 | + layout_args.output_distance, |
| 1156 | + *hipfft_transform_type, |
| 1157 | + nbatch, |
| 1158 | + worksize_estimate.data()); |
| 1159 | + else |
| 1160 | + ret = hipfftGetSizeMany(plan, |
| 1161 | + dim(), |
| 1162 | + int_length.data(), |
| 1163 | + layout_args.input_embed, |
| 1164 | + layout_args.input_stride, |
| 1165 | + layout_args.input_distance, |
| 1166 | + layout_args.output_embed, |
| 1167 | + layout_args.output_stride, |
| 1168 | + layout_args.output_distance, |
| 1169 | + *hipfft_transform_type, |
| 1170 | + nbatch, |
| 1171 | + worksize_estimate.data()); |
| 1172 | + break; |
| 1173 | + } |
| 1174 | + case CREATE_MAKE_PLAN_MANY64: |
| 1175 | + { |
| 1176 | + if(plan == INVALID_PLAN_HANDLE) |
| 1177 | + { |
| 1178 | + // no direct equivalent in estimate-fetching APIs |
| 1179 | + std::for_each(worksize_estimate.begin(), |
| 1180 | + worksize_estimate.end(), |
| 1181 | + [](decltype(worksize_estimate)::value_type& val) { val = 0; }); |
| 1182 | + ret = HIPFFT_SUCCESS; |
| 1183 | + } |
| 1184 | + else |
| 1185 | + { |
| 1186 | + auto layout_args = make_valid_layout_args_for_plan_many<long long>(); |
| 1187 | + ret = hipfftGetSizeMany64(plan, |
| 1188 | + dim(), |
| 1189 | + ll_length.data(), |
| 1190 | + layout_args.input_embed, |
| 1191 | + layout_args.input_stride, |
| 1192 | + layout_args.input_distance, |
| 1193 | + layout_args.output_embed, |
| 1194 | + layout_args.output_stride, |
| 1195 | + layout_args.output_distance, |
| 1196 | + *hipfft_transform_type, |
| 1197 | + nbatch, |
| 1198 | + worksize_estimate.data()); |
| 1199 | + } |
| 1200 | + break; |
| 1201 | + } |
| 1202 | + case CREATE_XT_MAKE_PLAN_MANY: |
| 1203 | + { |
| 1204 | + if(plan == INVALID_PLAN_HANDLE) |
| 1205 | + { |
| 1206 | + // no direct equivalent in estimate-fetching APIs |
| 1207 | + std::for_each(worksize_estimate.begin(), |
| 1208 | + worksize_estimate.end(), |
| 1209 | + [](decltype(worksize_estimate)::value_type& val) { val = 0; }); |
| 1210 | + ret = HIPFFT_SUCCESS; |
| 1211 | + } |
| 1212 | + else |
| 1213 | + { |
| 1214 | + auto executionType = get_xt_api_execution_type(); |
| 1215 | + auto layout_args = make_valid_layout_args_for_plan_many<long long>(); |
| 1216 | + ret = hipfftXtGetSizeMany(plan, |
| 1217 | + dim(), |
| 1218 | + ll_length.data(), |
| 1219 | + layout_args.input_embed, |
| 1220 | + layout_args.input_stride, |
| 1221 | + layout_args.input_distance, |
| 1222 | + inputType, |
| 1223 | + layout_args.output_embed, |
| 1224 | + layout_args.output_stride, |
| 1225 | + layout_args.output_distance, |
| 1226 | + outputType, |
| 1227 | + nbatch, |
| 1228 | + worksize_estimate.data(), |
| 1229 | + executionType); |
| 1230 | + } |
| 1231 | + break; |
| 1232 | + } |
| 1233 | + case PLAN_Nd: |
| 1234 | + case PLAN_MANY: |
| 1235 | + default: |
| 1236 | + { |
| 1237 | + // should be indirectly disabled via get_create_type() |
| 1238 | + return HIPFFT_INTERNAL_ERROR; |
| 1239 | + } |
| 1240 | + } |
| 1241 | + // check that the value(s) of worksize_estimate were actually set, assuming that |
| 1242 | + // setting a worksize_estimate equal to absurd_init_worksize_estimate by hipFFT |
| 1243 | + // cannot be considered "correct". |
| 1244 | + // Note: worksize_estimate value(s) are *not* guaranteed to be greater than or equal |
| 1245 | + // to the actual value(s) of the work area(s), queriable after plan generation via |
| 1246 | + // hipfftGetSize. |
| 1247 | + if(ret == HIPFFT_SUCCESS) |
| 1248 | + { |
| 1249 | + // the estimate can't have any knowledge about the number of GPUs being used if |
| 1250 | + // the plan wasn't created first |
| 1251 | + const size_t num_values_to_check |
| 1252 | + = plan == INVALID_PLAN_HANDLE ? 1 : worksize_estimate.size(); |
| 1253 | + for(size_t idx = 0; ret == HIPFFT_SUCCESS && idx < num_values_to_check; idx++) |
| 1254 | + { |
| 1255 | + ret = worksize_estimate[idx] != absurd_init_worksize_estimate |
| 1256 | + ? HIPFFT_SUCCESS |
| 1257 | + : HIPFFT_INTERNAL_ERROR; |
| 1258 | + } |
| 1259 | + } |
| 1260 | + return ret; |
| 1261 | + } |
| 1262 | + |
| 1263 | + // provide a work area to a successfully generated plan |
| 1264 | + hipfftResult_t set_externally_managed_work_areas() |
| 1265 | + { |
| 1266 | + std::vector<size_t> req_workarea_sizes(get_num_used_gpus(), absurd_init_worksize_estimate); |
| 1267 | + hipfftResult_t ret = hipfftGetSize(plan, req_workarea_sizes.data()); |
| 1268 | + if(ret != HIPFFT_SUCCESS) |
| 1269 | + { |
| 1270 | + return ret; |
| 1271 | + } |
| 1272 | + else if(std::any_of(req_workarea_sizes.begin(), |
| 1273 | + req_workarea_sizes.end(), |
| 1274 | + [](const decltype(req_workarea_sizes)::value_type& val) { |
| 1275 | + return val == absurd_init_worksize_estimate; |
| 1276 | + })) |
| 1277 | + { |
| 1278 | + return HIPFFT_INTERNAL_ERROR; |
| 1279 | + } |
| 1280 | + // req_workarea_sizes are known and validated |
| 1281 | + // check if the current externally_managed_workareas can be used as is or not |
| 1282 | + if(externally_managed_workareas.size() < get_num_used_gpus()) |
| 1283 | + externally_managed_workareas.resize(get_num_used_gpus()); |
| 1284 | + std::vector<void*> workareas(get_num_used_gpus(), nullptr); |
| 1285 | + for(auto workarea_idx = 0; workarea_idx < get_num_used_gpus(); workarea_idx++) |
| 1286 | + { |
| 1287 | + const auto req_size = req_workarea_sizes[workarea_idx]; |
| 1288 | + auto& buf = externally_managed_workareas[workarea_idx]; |
| 1289 | + if(buf.size() < req_size) |
| 1290 | + { |
| 1291 | + // too small, free and reallocate to meet current needs |
| 1292 | + buf.free(); |
| 1293 | + if(buf.alloc(req_size) != hipSuccess) |
| 1294 | + { |
| 1295 | + return HIPFFT_ALLOC_FAILED; |
| 1296 | + } |
| 1297 | + } |
| 1298 | + workareas[workarea_idx] = buf.data(); |
| 1299 | + } |
| 1300 | + if(get_num_used_gpus() > 1) |
| 1301 | + { |
| 1302 | + // TODO: enable below once hipfftXtSetWorkArea is enabled |
| 1303 | +#if(0) |
| 1304 | + ret = hipfftXtSetWorkArea(plan, workareas.data); |
| 1305 | +#else |
| 1306 | + throw unimplemented_exception( |
| 1307 | + "No implementation support for externally-managed work areas with multi-gpu usage"); |
| 1308 | +#endif |
| 1309 | + } |
| 1310 | + else |
| 1311 | + { |
| 1312 | + ret = hipfftSetWorkArea(plan, workareas[0]); |
| 1313 | + } |
| 1314 | + if(ret == HIPFFT_SUCCESS) |
| 1315 | + { |
| 1316 | + // the above "SetWorkArea" frees auto_allocated worksizes (if any) |
| 1317 | + auto_allocated_worksizes.clear(); |
| 1318 | + } |
| 1319 | + return ret; |
| 1320 | + } |
| 1321 | + |
| 1322 | // return true if we need to use hipFFT APIs that separate plan |
| 1323 | // allocation and plan init |
| 1324 | bool need_separate_create_make() const |
| 1325 | { |
| 1326 | - // scale factor and multi-GPU need API calls between create + |
| 1327 | - // init |
| 1328 | - if(scale_factor != 1.0 || multiGPU > 1 || mp_lib != fft_mp_lib_none) |
| 1329 | + // scale factor and multi-GPU and disabled auto-allocation need API |
| 1330 | + // calls between create + init |
| 1331 | + if(scale_factor != 1.0 || multiGPU > 1 || mp_lib != fft_mp_lib_none |
| 1332 | + || auto_allocate == fft_auto_allocation_off) |
| 1333 | return true; |
| 1334 | return false; |
| 1335 | } |
| 1336 | |
| 1337 | + template < |
| 1338 | + typename T, |
| 1339 | + std::enable_if_t<std::is_same_v<T, int> || std::is_same_v<T, long long int>, bool> = true> |
| 1340 | + many_api_layout_args<T> make_valid_layout_args_for_plan_many() |
| 1341 | + { |
| 1342 | + many_api_layout_args<T> ret; |
| 1343 | + if constexpr(std::is_same_v<T, int>) |
| 1344 | + { |
| 1345 | + ret.input_embed = int_inembed.data(); |
| 1346 | + ret.output_embed = int_onembed.data(); |
| 1347 | + } |
| 1348 | + else |
| 1349 | + { |
| 1350 | + ret.input_embed = ll_inembed.data(); |
| 1351 | + ret.output_embed = ll_onembed.data(); |
| 1352 | + } |
| 1353 | + ret.input_stride = static_cast<T>(istride.back()); |
| 1354 | + ret.output_stride = static_cast<T>(ostride.back()); |
| 1355 | + ret.input_distance = static_cast<T>(idist); |
| 1356 | + ret.output_distance = static_cast<T>(odist); |
| 1357 | + if(is_using_default_layout()) |
| 1358 | + { |
| 1359 | + // If using a default layout, users can |
| 1360 | + // (A) either set explicitly inembed, onembed, strides, and distances (like above); |
| 1361 | + // (B) or use nullptr as arguments for inembed and onembed. Strides and |
| 1362 | + // distances are supposed to be ignored in that case. |
| 1363 | + // --> choose randomly between either valid usage when a default layout is |
| 1364 | + // used, so that all possible valid use case scenarios are considered. |
| 1365 | + const std::string test_token = token(); |
| 1366 | + int randomizer; |
| 1367 | + set_with_random_nonnegative_values(test_token, randomizer); |
| 1368 | + if(randomizer % 2 == 0) |
| 1369 | + { |
| 1370 | + ret.input_embed = nullptr; |
| 1371 | + ret.output_embed = nullptr; |
| 1372 | + // FIXME: negative values are not truly ignored for now. |
| 1373 | + set_with_random_nonnegative_values(test_token, |
| 1374 | + ret.input_stride, |
| 1375 | + ret.output_stride, |
| 1376 | + ret.input_distance, |
| 1377 | + ret.output_distance); |
| 1378 | + } |
| 1379 | + } |
| 1380 | + return ret; |
| 1381 | + } |
| 1382 | + |
| 1383 | // Not all plan options work with all creation types. Return a |
| 1384 | // suitable plan creation type for the current FFT parameters. |
| 1385 | int get_create_type() |
| 1386 | @@ -996,15 +1424,16 @@ private: |
| 1387 | } |
| 1388 | hipfftResult_t create_plan_many() |
| 1389 | { |
| 1390 | - auto ret = hipfftPlanMany(&plan, |
| 1391 | + auto layout_args = make_valid_layout_args_for_plan_many<int>(); |
| 1392 | + auto ret = hipfftPlanMany(&plan, |
| 1393 | dim(), |
| 1394 | int_length.data(), |
| 1395 | - int_inembed.data(), |
| 1396 | - istride.back(), |
| 1397 | - idist, |
| 1398 | - int_onembed.data(), |
| 1399 | - ostride.back(), |
| 1400 | - odist, |
| 1401 | + layout_args.input_embed, |
| 1402 | + layout_args.input_stride, |
| 1403 | + layout_args.input_distance, |
| 1404 | + layout_args.output_embed, |
| 1405 | + layout_args.output_stride, |
| 1406 | + layout_args.output_distance, |
| 1407 | *hipfft_transform_type, |
| 1408 | nbatch); |
| 1409 | return ret; |
| 1410 | @@ -1014,7 +1443,14 @@ private: |
| 1411 | // relevant pre-Make APIs (scale factor, XtSetGPUs) |
| 1412 | hipfftResult_t create_with_pre_make() |
| 1413 | { |
| 1414 | - auto ret = hipfftCreate(&plan); |
| 1415 | + hipfftResult_t ret{HIPFFT_INVALID_PLAN}; |
| 1416 | + if(auto_allocate == fft_auto_allocation_off) |
| 1417 | + { |
| 1418 | + ret = check_worksize_estimate(); // read worksize estimate before plan creation |
| 1419 | + if(ret != HIPFFT_SUCCESS) |
| 1420 | + return ret; |
| 1421 | + } |
| 1422 | + ret = hipfftCreate(&plan); |
| 1423 | if(ret != HIPFFT_SUCCESS) |
| 1424 | return ret; |
| 1425 | if(scale_factor != 1.0) |
| 1426 | @@ -1026,7 +1462,8 @@ private: |
| 1427 | if(multiGPU > 1) |
| 1428 | { |
| 1429 | int deviceCount = 0; |
| 1430 | - (void)hipGetDeviceCount(&deviceCount); |
| 1431 | + if(hipGetDeviceCount(&deviceCount) != hipSuccess) |
| 1432 | + throw std::runtime_error("hipGetDeviceCount failed"); |
| 1433 | |
| 1434 | // ensure that users request less than or equal to the total number of devices |
| 1435 | if(static_cast<int>(multiGPU) > deviceCount) |
| 1436 | @@ -1035,9 +1472,8 @@ private: |
| 1437 | std::vector<int> GPUs(multiGPU); |
| 1438 | std::iota(GPUs.begin(), GPUs.end(), 0); |
| 1439 | ret = hipfftXtSetGPUs(plan, static_cast<int>(multiGPU), GPUs.data()); |
| 1440 | - |
| 1441 | - xt_worksize.resize(GPUs.size()); |
| 1442 | - workbuffersize_ptr = xt_worksize.data(); |
| 1443 | + if(ret != HIPFFT_SUCCESS) |
| 1444 | + return ret; |
| 1445 | } |
| 1446 | if(mp_lib == fft_mp_lib_mpi) |
| 1447 | { |
| 1448 | @@ -1090,34 +1526,52 @@ private: |
| 1449 | input_stride.data(), |
| 1450 | output_stride.data()); |
| 1451 | } |
| 1452 | + if(ret != HIPFFT_SUCCESS) |
| 1453 | + return ret; |
| 1454 | #else |
| 1455 | throw std::runtime_error("MPI is not enabled"); |
| 1456 | #endif |
| 1457 | } |
| 1458 | + if(auto_allocate == fft_auto_allocation_off) |
| 1459 | + { |
| 1460 | + ret = check_worksize_estimate(); // read worksize estimate again after plan creation |
| 1461 | + if(ret != HIPFFT_SUCCESS) |
| 1462 | + return ret; |
| 1463 | + } |
| 1464 | + if(is_preventing_auto_allocation_at_generation()) |
| 1465 | + { |
| 1466 | + ret = hipfftSetAutoAllocation(plan, 0); |
| 1467 | + } |
| 1468 | + |
| 1469 | return ret; |
| 1470 | } |
| 1471 | |
| 1472 | hipfftResult_t create_make_plan_Nd() |
| 1473 | { |
| 1474 | auto ret = create_with_pre_make(); |
| 1475 | + |
| 1476 | if(ret != HIPFFT_SUCCESS) |
| 1477 | return ret; |
| 1478 | - |
| 1479 | + // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented |
| 1480 | + std::vector<size_t> tmp_worksize(get_num_used_gpus()); |
| 1481 | + size_t* worksize_ptr = is_preventing_auto_allocation_at_generation() |
| 1482 | + ? tmp_worksize.data() |
| 1483 | + : auto_allocated_worksizes.data(); |
| 1484 | switch(dim()) |
| 1485 | { |
| 1486 | case 1: |
| 1487 | return hipfftMakePlan1d( |
| 1488 | - plan, int_length[0], *hipfft_transform_type, nbatch, workbuffersize_ptr); |
| 1489 | + plan, int_length[0], *hipfft_transform_type, nbatch, worksize_ptr); |
| 1490 | case 2: |
| 1491 | return hipfftMakePlan2d( |
| 1492 | - plan, int_length[0], int_length[1], *hipfft_transform_type, workbuffersize_ptr); |
| 1493 | + plan, int_length[0], int_length[1], *hipfft_transform_type, worksize_ptr); |
| 1494 | case 3: |
| 1495 | return hipfftMakePlan3d(plan, |
| 1496 | int_length[0], |
| 1497 | int_length[1], |
| 1498 | int_length[2], |
| 1499 | *hipfft_transform_type, |
| 1500 | - workbuffersize_ptr); |
| 1501 | + worksize_ptr); |
| 1502 | default: |
| 1503 | throw std::runtime_error("invalid dim"); |
| 1504 | } |
| 1505 | @@ -1128,18 +1582,24 @@ private: |
| 1506 | auto ret = create_with_pre_make(); |
| 1507 | if(ret != HIPFFT_SUCCESS) |
| 1508 | return ret; |
| 1509 | + // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented |
| 1510 | + std::vector<size_t> tmp_worksize(get_num_used_gpus()); |
| 1511 | + size_t* worksize_ptr = is_preventing_auto_allocation_at_generation() |
| 1512 | + ? tmp_worksize.data() |
| 1513 | + : auto_allocated_worksizes.data(); |
| 1514 | + auto layout_args = make_valid_layout_args_for_plan_many<int>(); |
| 1515 | return hipfftMakePlanMany(plan, |
| 1516 | dim(), |
| 1517 | int_length.data(), |
| 1518 | - int_inembed.data(), |
| 1519 | - istride.back(), |
| 1520 | - idist, |
| 1521 | - int_onembed.data(), |
| 1522 | - ostride.back(), |
| 1523 | - odist, |
| 1524 | + layout_args.input_embed, |
| 1525 | + layout_args.input_stride, |
| 1526 | + layout_args.input_distance, |
| 1527 | + layout_args.output_embed, |
| 1528 | + layout_args.output_stride, |
| 1529 | + layout_args.output_distance, |
| 1530 | *hipfft_transform_type, |
| 1531 | nbatch, |
| 1532 | - workbuffersize_ptr); |
| 1533 | + worksize_ptr); |
| 1534 | } |
| 1535 | |
| 1536 | hipfftResult_t create_make_plan_many64() |
| 1537 | @@ -1147,58 +1607,85 @@ private: |
| 1538 | auto ret = create_with_pre_make(); |
| 1539 | if(ret != HIPFFT_SUCCESS) |
| 1540 | return ret; |
| 1541 | + |
| 1542 | + // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented |
| 1543 | + std::vector<size_t> tmp_worksize(get_num_used_gpus()); |
| 1544 | + size_t* worksize_ptr = is_preventing_auto_allocation_at_generation() |
| 1545 | + ? tmp_worksize.data() |
| 1546 | + : auto_allocated_worksizes.data(); |
| 1547 | + auto layout_args = make_valid_layout_args_for_plan_many<long long int>(); |
| 1548 | return hipfftMakePlanMany64(plan, |
| 1549 | dim(), |
| 1550 | ll_length.data(), |
| 1551 | - ll_inembed.data(), |
| 1552 | - istride.back(), |
| 1553 | - idist, |
| 1554 | - ll_onembed.data(), |
| 1555 | - ostride.back(), |
| 1556 | - odist, |
| 1557 | + layout_args.input_embed, |
| 1558 | + layout_args.input_stride, |
| 1559 | + layout_args.input_distance, |
| 1560 | + layout_args.output_embed, |
| 1561 | + layout_args.output_stride, |
| 1562 | + layout_args.output_distance, |
| 1563 | *hipfft_transform_type, |
| 1564 | nbatch, |
| 1565 | - workbuffersize_ptr); |
| 1566 | + worksize_ptr); |
| 1567 | } |
| 1568 | |
| 1569 | - hipfftResult_t create_xt_make_plan_many() |
| 1570 | + hipDataType get_xt_api_execution_type() const |
| 1571 | { |
| 1572 | - auto ret = create_with_pre_make(); |
| 1573 | - if(ret != HIPFFT_SUCCESS) |
| 1574 | - return ret; |
| 1575 | - |
| 1576 | // execution type is always complex, matching the precision |
| 1577 | // of the transform |
| 1578 | // Initializing as double by default |
| 1579 | - hipDataType executionType = HIP_C_64F; |
| 1580 | + hipDataType ret = HIP_C_64F; |
| 1581 | switch(precision) |
| 1582 | { |
| 1583 | case fft_precision_half: |
| 1584 | - executionType = HIP_C_16F; |
| 1585 | + ret = HIP_C_16F; |
| 1586 | break; |
| 1587 | case fft_precision_single: |
| 1588 | - executionType = HIP_C_32F; |
| 1589 | + ret = HIP_C_32F; |
| 1590 | break; |
| 1591 | case fft_precision_double: |
| 1592 | - executionType = HIP_C_64F; |
| 1593 | + ret = HIP_C_64F; |
| 1594 | break; |
| 1595 | + default: |
| 1596 | + throw std::runtime_error("Invalid precision"); |
| 1597 | } |
| 1598 | + return ret; |
| 1599 | + } |
| 1600 | |
| 1601 | + hipfftResult_t create_xt_make_plan_many() |
| 1602 | + { |
| 1603 | + auto ret = create_with_pre_make(); |
| 1604 | + if(ret != HIPFFT_SUCCESS) |
| 1605 | + return ret; |
| 1606 | + |
| 1607 | + // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented |
| 1608 | + std::vector<size_t> tmp_worksize(get_num_used_gpus()); |
| 1609 | + size_t* worksize_ptr = is_preventing_auto_allocation_at_generation() |
| 1610 | + ? tmp_worksize.data() |
| 1611 | + : auto_allocated_worksizes.data(); |
| 1612 | + auto executionType = get_xt_api_execution_type(); |
| 1613 | + auto layout_args = make_valid_layout_args_for_plan_many<long long int>(); |
| 1614 | return hipfftXtMakePlanMany(plan, |
| 1615 | dim(), |
| 1616 | ll_length.data(), |
| 1617 | - ll_inembed.data(), |
| 1618 | - istride.back(), |
| 1619 | - idist, |
| 1620 | + layout_args.input_embed, |
| 1621 | + layout_args.input_stride, |
| 1622 | + layout_args.input_distance, |
| 1623 | inputType, |
| 1624 | - ll_onembed.data(), |
| 1625 | - ostride.back(), |
| 1626 | - odist, |
| 1627 | + layout_args.output_embed, |
| 1628 | + layout_args.output_stride, |
| 1629 | + layout_args.output_distance, |
| 1630 | outputType, |
| 1631 | nbatch, |
| 1632 | - workbuffersize_ptr, |
| 1633 | + worksize_ptr, |
| 1634 | executionType); |
| 1635 | } |
| 1636 | + static constexpr size_t absurd_init_worksize_estimate = std::numeric_limits<size_t>::max(); |
| 1637 | + bool final_attempt_at_plan_creation = false; |
| 1638 | + |
| 1639 | + size_t get_num_used_gpus() const |
| 1640 | + { |
| 1641 | + return multiGPU > 1 ? multiGPU : 1; |
| 1642 | + }; |
| 1643 | }; |
| 1644 | |
| 1645 | #endif |
| 1646 | diff --git a/clients/hipfftw_helper.h b/clients/hipfftw_helper.h |
| 1647 | new file mode 100644 |
| 1648 | index 0000000..3e86134 |
| 1649 | --- /dev/null |
| 1650 | +++ b/clients/hipfftw_helper.h |
| 1651 | @@ -0,0 +1,1238 @@ |
| 1652 | +// Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved. |
| 1653 | +// |
| 1654 | +// Permission is hereby granted, free of charge, to any person obtaining a copy |
| 1655 | +// of this software and associated documentation files (the "Software"), to deal |
| 1656 | +// in the Software without restriction, including without limitation the rights |
| 1657 | +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
| 1658 | +// copies of the Software, and to permit persons to whom the Software is |
| 1659 | +// furnished to do so, subject to the following conditions: |
| 1660 | +// |
| 1661 | +// The above copyright notice and this permission notice shall be included in |
| 1662 | +// all copies or substantial portions of the Software. |
| 1663 | +// |
| 1664 | +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| 1665 | +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| 1666 | +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| 1667 | +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| 1668 | +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| 1669 | +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
| 1670 | +// THE SOFTWARE. |
| 1671 | + |
| 1672 | +#ifndef HIPFFTW_HELPER_H |
| 1673 | +#define HIPFFTW_HELPER_H |
| 1674 | + |
| 1675 | +#include "../shared/environment.h" |
| 1676 | +#include "../shared/fft_params.h" |
| 1677 | +#include <algorithm> |
| 1678 | +#include <fftw3.h> |
| 1679 | +#include <memory> |
| 1680 | +#include <sstream> |
| 1681 | +#include <stdexcept> |
| 1682 | +#include <string> |
| 1683 | +#include <type_traits> |
| 1684 | + |
| 1685 | +#ifdef WIN32 |
| 1686 | +#include <windows.h> |
| 1687 | +// psapi.h requires windows.h to be included first |
| 1688 | +#include <psapi.h> |
| 1689 | +typedef HMODULE LIB_HANDLE_T; |
| 1690 | +#else |
| 1691 | +#include <dlfcn.h> |
| 1692 | +#include <link.h> |
| 1693 | +typedef void* LIB_HANDLE_T; |
| 1694 | +#endif |
| 1695 | + |
| 1696 | +template <fft_precision prec> |
| 1697 | +struct hipfftw_trait; |
| 1698 | +template <> |
| 1699 | +struct hipfftw_trait<fft_precision_single> |
| 1700 | +{ |
| 1701 | + using plan_t = fftwf_plan; |
| 1702 | + using complex_t = fftwf_complex; |
| 1703 | + using real_t = float; |
| 1704 | +}; |
| 1705 | +template <> |
| 1706 | +struct hipfftw_trait<fft_precision_double> |
| 1707 | +{ |
| 1708 | + using plan_t = fftw_plan; |
| 1709 | + using complex_t = fftw_complex; |
| 1710 | + using real_t = double; |
| 1711 | +}; |
| 1712 | + |
| 1713 | +template <fft_precision prec> |
| 1714 | +using hipfftw_real_t = typename hipfftw_trait<prec>::real_t; |
| 1715 | +template <fft_precision prec> |
| 1716 | +using hipfftw_complex_t = typename hipfftw_trait<prec>::complex_t; |
| 1717 | +template <fft_precision prec> |
| 1718 | +using hipfftw_plan_t = typename hipfftw_trait<prec>::plan_t; |
| 1719 | + |
| 1720 | +// singleton class encapsulating the dynamically-loaded hipfftw library |
| 1721 | +class dynamically_loaded_hipfftw |
| 1722 | +{ |
| 1723 | +private: |
| 1724 | + LIB_HANDLE_T lib_handle; |
| 1725 | + std::ostringstream load_error_info; |
| 1726 | + |
| 1727 | + dynamically_loaded_hipfftw() |
| 1728 | + { |
| 1729 | +#ifdef __HIP_PLATFORM_AMD__ |
| 1730 | + const std::string lib_basename = "hipfftw"; |
| 1731 | +#else |
| 1732 | + const std::string lib_basename = "cufftw"; |
| 1733 | +#endif |
| 1734 | +#ifdef WIN32 |
| 1735 | + const std::string lib_fullame = lib_basename + ".dll"; |
| 1736 | + lib_handle = LoadLibraryA(lib_fullame.c_str()); |
| 1737 | +#else |
| 1738 | + const std::string lib_fullame = "lib" + lib_basename + ".so"; |
| 1739 | + lib_handle = dlopen(lib_fullame.c_str(), RTLD_LAZY); |
| 1740 | +#endif |
| 1741 | + load_error_info.clear(); |
| 1742 | + if(!lib_handle) |
| 1743 | + { |
| 1744 | + load_error_info << "failed to open library " << lib_fullame; |
| 1745 | +#ifdef WIN32 |
| 1746 | + load_error_info << ". System's error code = " << GetLastError(); |
| 1747 | +#else |
| 1748 | + load_error_info << ". System's error message = " << dlerror(); |
| 1749 | +#endif |
| 1750 | + // do not throw from here to ease exception handling |
| 1751 | + } |
| 1752 | + } |
| 1753 | + /* disable copies and moves */ |
| 1754 | + dynamically_loaded_hipfftw(const dynamically_loaded_hipfftw&) = delete; |
| 1755 | + dynamically_loaded_hipfftw(dynamically_loaded_hipfftw&&) = delete; |
| 1756 | + dynamically_loaded_hipfftw& operator=(const dynamically_loaded_hipfftw&) = delete; |
| 1757 | + dynamically_loaded_hipfftw& operator=(dynamically_loaded_hipfftw&&) = delete; |
| 1758 | + |
| 1759 | + static const dynamically_loaded_hipfftw& get_instance() |
| 1760 | + { |
| 1761 | + static dynamically_loaded_hipfftw singleton_instance; |
| 1762 | + return singleton_instance; |
| 1763 | + } |
| 1764 | + |
| 1765 | +public: |
| 1766 | + static LIB_HANDLE_T get_lib() |
| 1767 | + { |
| 1768 | + return get_instance().lib_handle; |
| 1769 | + } |
| 1770 | + static std::string get_load_error_info() |
| 1771 | + { |
| 1772 | + return get_instance().load_error_info.str(); |
| 1773 | + } |
| 1774 | + ~dynamically_loaded_hipfftw() |
| 1775 | + { |
| 1776 | + if(lib_handle) |
| 1777 | + { |
| 1778 | +#ifdef WIN32 |
| 1779 | + (void)FreeLibrary(lib_handle); |
| 1780 | +#else |
| 1781 | + (void)dlclose(lib_handle); |
| 1782 | +#endif |
| 1783 | + } |
| 1784 | + lib_handle = nullptr; |
| 1785 | + } |
| 1786 | +}; |
| 1787 | + |
| 1788 | +// exception specific to issues when loading hipfftw and/or when fetching |
| 1789 | +// the address of the supposedly-available functions therefrom |
| 1790 | +struct hipfftw_undefined_function_ptr : std::runtime_error |
| 1791 | +{ |
| 1792 | + using std::runtime_error::runtime_error; |
| 1793 | +}; |
| 1794 | + |
| 1795 | +// helper struct for retrieving a function's return type |
| 1796 | +template <class T> |
| 1797 | +struct func_ret; |
| 1798 | +template <typename R, class... Args> |
| 1799 | +struct func_ret<R(Args...)> |
| 1800 | +{ |
| 1801 | + using type = R; |
| 1802 | +}; |
| 1803 | +template <class T> |
| 1804 | +using func_ret_t = typename func_ret<T>::type; |
| 1805 | + |
| 1806 | +template <typename func_type, std::enable_if_t<std::is_function_v<func_type>, bool> = true> |
| 1807 | +struct dynamically_loaded_function_t |
| 1808 | +{ |
| 1809 | +private: |
| 1810 | + // address of the desired function, to be fetched from a dynamically loaded shared library |
| 1811 | + func_type* func_ptr; |
| 1812 | + // address of the reference function (from linked fftw3) |
| 1813 | + func_type* const reference_func_ptr; |
| 1814 | + // symbol of said function |
| 1815 | + std::string func_symbol; |
| 1816 | + |
| 1817 | +public: |
| 1818 | + dynamically_loaded_function_t(const char* symbol, func_type* ref_func_address) |
| 1819 | + : func_ptr(nullptr) |
| 1820 | + , reference_func_ptr(ref_func_address) |
| 1821 | + , func_symbol(symbol) |
| 1822 | + { |
| 1823 | + } |
| 1824 | + |
| 1825 | + // forwarding functional calls |
| 1826 | + template <typename... Args> |
| 1827 | + func_ret_t<func_type> operator()(Args... args) const |
| 1828 | + { |
| 1829 | + if(!may_be_used()) |
| 1830 | + throw hipfftw_undefined_function_ptr(dynamically_loaded_hipfftw::get_load_error_info()); |
| 1831 | + return func_ptr(args...); |
| 1832 | + } |
| 1833 | + template <bool call_reference, typename... Args> |
| 1834 | + func_ret_t<func_type> call(Args... args) const |
| 1835 | + { |
| 1836 | + if constexpr(!call_reference) |
| 1837 | + { |
| 1838 | + return this->operator()(args...); |
| 1839 | + } |
| 1840 | + else |
| 1841 | + { |
| 1842 | + if(!reference_func_ptr) |
| 1843 | + throw hipfftw_undefined_function_ptr( |
| 1844 | + "Ill-defined reference function pointer for symbol " + func_symbol); |
| 1845 | + return reference_func_ptr(args...); |
| 1846 | + } |
| 1847 | + // unreachable |
| 1848 | + } |
| 1849 | + void load_implementation() |
| 1850 | + { |
| 1851 | + const auto hipfftw_lib = dynamically_loaded_hipfftw::get_lib(); |
| 1852 | + if(!hipfftw_lib) |
| 1853 | + { |
| 1854 | + // make func_ptr unambiguously unset to force the dedicated exception |
| 1855 | + // to be thrown at forwarded functional call(s) |
| 1856 | + func_ptr = nullptr; |
| 1857 | + return; |
| 1858 | + } |
| 1859 | +#ifdef WIN32 |
| 1860 | + func_ptr = reinterpret_cast<func_type*>(GetProcAddress(hipfftw_lib, func_symbol.c_str())); |
| 1861 | +#else |
| 1862 | + func_ptr = reinterpret_cast<func_type*>(dlsym(hipfftw_lib, func_symbol.c_str())); |
| 1863 | +#endif |
| 1864 | + } |
| 1865 | + bool may_be_used() const |
| 1866 | + { |
| 1867 | + return func_ptr != nullptr; |
| 1868 | + } |
| 1869 | + std::string get_symbol() const |
| 1870 | + { |
| 1871 | + return func_symbol; |
| 1872 | + } |
| 1873 | +}; |
| 1874 | + |
| 1875 | +template <typename T, typename... Args> |
| 1876 | +static void load_implementations(dynamically_loaded_function_t<T>& first, Args&... others) |
| 1877 | +{ |
| 1878 | + first.load_implementation(); |
| 1879 | + if constexpr(sizeof...(others) > 0) |
| 1880 | + load_implementations(others...); |
| 1881 | +} |
| 1882 | + |
| 1883 | +// define singleton structures encapsulating all the hipfftw function |
| 1884 | +// pointers (one specialization per supported precision) |
| 1885 | +template <fft_precision prec> |
| 1886 | +struct hipfftw_funcs; |
| 1887 | + |
| 1888 | +#define HIPFFTW_STRINGIFY(x) #x |
| 1889 | +#define HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, func) \ |
| 1890 | + dynamically_loaded_function_t<decltype(prefix##func)> func \ |
| 1891 | + = dynamically_loaded_function_t<decltype(prefix##func)>(HIPFFTW_STRINGIFY(prefix##func), \ |
| 1892 | + &(prefix##func)); |
| 1893 | + |
| 1894 | +#define HIPFFTW_FUNCS_SPECIALIZATION(prefix, specialization) \ |
| 1895 | + template <> \ |
| 1896 | + struct hipfftw_funcs<specialization> \ |
| 1897 | + { \ |
| 1898 | + private: \ |
| 1899 | + hipfftw_funcs() \ |
| 1900 | + { \ |
| 1901 | + load_implementations(malloc, \ |
| 1902 | + alloc_real, \ |
| 1903 | + alloc_complex, \ |
| 1904 | + free, \ |
| 1905 | + destroy_plan, \ |
| 1906 | + cleanup, \ |
| 1907 | + execute, \ |
| 1908 | + plan_dft_1d, \ |
| 1909 | + plan_dft_2d, \ |
| 1910 | + plan_dft_3d, \ |
| 1911 | + plan_dft, \ |
| 1912 | + plan_dft_r2c_1d, \ |
| 1913 | + plan_dft_r2c_2d, \ |
| 1914 | + plan_dft_r2c_3d, \ |
| 1915 | + plan_dft_r2c, \ |
| 1916 | + plan_dft_c2r_1d, \ |
| 1917 | + plan_dft_c2r_2d, \ |
| 1918 | + plan_dft_c2r_3d, \ |
| 1919 | + plan_dft_c2r, \ |
| 1920 | + print_plan, \ |
| 1921 | + set_timelimit, \ |
| 1922 | + cost, \ |
| 1923 | + flops); \ |
| 1924 | + } \ |
| 1925 | + /* disable copies and moves */ \ |
| 1926 | + hipfftw_funcs(const hipfftw_funcs&) = delete; \ |
| 1927 | + hipfftw_funcs& operator=(const hipfftw_funcs&) = delete; \ |
| 1928 | + hipfftw_funcs(hipfftw_funcs&&) = delete; \ |
| 1929 | + hipfftw_funcs& operator=(hipfftw_funcs&&) = delete; \ |
| 1930 | + \ |
| 1931 | + public: \ |
| 1932 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, malloc) \ |
| 1933 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, alloc_real) \ |
| 1934 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, alloc_complex) \ |
| 1935 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, free) \ |
| 1936 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, destroy_plan) \ |
| 1937 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, cleanup) \ |
| 1938 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, execute) \ |
| 1939 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_1d) \ |
| 1940 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_2d) \ |
| 1941 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_3d) \ |
| 1942 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft) \ |
| 1943 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c_1d) \ |
| 1944 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c_2d) \ |
| 1945 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c_3d) \ |
| 1946 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c) \ |
| 1947 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r_1d) \ |
| 1948 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r_2d) \ |
| 1949 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r_3d) \ |
| 1950 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r) \ |
| 1951 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, print_plan) \ |
| 1952 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, set_timelimit) \ |
| 1953 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, cost) \ |
| 1954 | + HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, flops) \ |
| 1955 | + static const hipfftw_funcs& get_instance() \ |
| 1956 | + { \ |
| 1957 | + static const hipfftw_funcs instance; \ |
| 1958 | + return instance; \ |
| 1959 | + } \ |
| 1960 | + } |
| 1961 | + |
| 1962 | +HIPFFTW_FUNCS_SPECIALIZATION(fftwf_, fft_precision_single); |
| 1963 | +HIPFFTW_FUNCS_SPECIALIZATION(fftw_, fft_precision_double); |
| 1964 | + |
| 1965 | +// structure enabling verbosity for hipfftw's exception handler and redirecting std::cerr |
| 1966 | +// to a runtime buffer throughout its lifetime (unless it was already enabled prior/externally) |
| 1967 | +struct hipfftw_exception_logger |
| 1968 | +{ |
| 1969 | + bool active; |
| 1970 | + std::stringstream buffer; |
| 1971 | + std::streambuf* const original_cerr_rdbuf = nullptr; |
| 1972 | + |
| 1973 | + std::unique_ptr<EnvironmentSetTemp> hipfftw_temp_logger_env; |
| 1974 | + |
| 1975 | +public: |
| 1976 | + hipfftw_exception_logger() |
| 1977 | + : active(false) |
| 1978 | + , original_cerr_rdbuf(std::cerr.rdbuf()) |
| 1979 | + { |
| 1980 | +#ifdef __HIP_PLATFORM_AMD__ |
| 1981 | + const auto env_val = rocfft_getenv("HIPFFTW_LOG_EXCEPTIONS"); |
| 1982 | + // activate temporary redirection only if not already used otherwise |
| 1983 | + // (e.g., in test user's environment ) |
| 1984 | + if(env_val.empty() || std::stoull(env_val) == 0) |
| 1985 | + { |
| 1986 | + hipfftw_temp_logger_env |
| 1987 | + = std::make_unique<EnvironmentSetTemp>("HIPFFTW_LOG_EXCEPTIONS", "1"); |
| 1988 | + const auto temp_env_val = rocfft_getenv("HIPFFTW_LOG_EXCEPTIONS"); |
| 1989 | + active = !temp_env_val.empty() && std::stoull(temp_env_val) != 0; |
| 1990 | + } |
| 1991 | +#endif |
| 1992 | + if(active) |
| 1993 | + std::cerr.rdbuf(buffer.rdbuf()); |
| 1994 | + } |
| 1995 | + hipfftw_exception_logger(const hipfftw_exception_logger&) = delete; |
| 1996 | + hipfftw_exception_logger(hipfftw_exception_logger&&) = delete; |
| 1997 | + hipfftw_exception_logger& operator=(const hipfftw_exception_logger&) = delete; |
| 1998 | + hipfftw_exception_logger& operator=(hipfftw_exception_logger&&) = delete; |
| 1999 | + ~hipfftw_exception_logger() |
| 2000 | + { |
| 2001 | + if(active) |
| 2002 | + { |
| 2003 | + // restore cerr to its original state |
| 2004 | + std::cerr.rdbuf(original_cerr_rdbuf); |
| 2005 | + } |
| 2006 | + } |
| 2007 | + bool is_active() const |
| 2008 | + { |
| 2009 | + return active; |
| 2010 | + } |
| 2011 | + std::string get_log() const |
| 2012 | + { |
| 2013 | + return buffer.str(); |
| 2014 | + } |
| 2015 | +}; |
| 2016 | + |
| 2017 | +// bit-flagging enum used for representing (combinations of) plan creation |
| 2018 | +// function(s) to consider |
| 2019 | +enum hipfftw_plan_creation_func : unsigned |
| 2020 | +{ |
| 2021 | + NONE = 0x0, // not to be used (exceptfor validating values) |
| 2022 | + PLAN_DFT_ND = 0x1 << 0, |
| 2023 | + PLAN_DFT = 0x1 << 1, |
| 2024 | + PLAN_MANY = 0x1 << 2, |
| 2025 | + PLAN_GURU = 0x1 << 3, |
| 2026 | + PLAN_GURU64 = 0x1 << 4, |
| 2027 | + ANY = PLAN_DFT_ND | PLAN_DFT | PLAN_MANY | PLAN_GURU | PLAN_GURU64 |
| 2028 | +}; |
| 2029 | +static const std::vector<hipfftw_plan_creation_func> hipfftw_plan_creation_func_candidates |
| 2030 | + = {hipfftw_plan_creation_func::PLAN_DFT_ND, |
| 2031 | + hipfftw_plan_creation_func::PLAN_DFT, |
| 2032 | + hipfftw_plan_creation_func::PLAN_MANY, |
| 2033 | + hipfftw_plan_creation_func::PLAN_GURU, |
| 2034 | + hipfftw_plan_creation_func::PLAN_GURU64}; |
| 2035 | + |
| 2036 | +static bool hipfftw_creation_options_are_well_defined(hipfftw_plan_creation_func creation_options) |
| 2037 | +{ |
| 2038 | + return creation_options == (creation_options & hipfftw_plan_creation_func::ANY); |
| 2039 | +} |
| 2040 | + |
| 2041 | +static std::string hipfftw_creation_options_to_string(hipfftw_plan_creation_func creation_options, |
| 2042 | + fft_transform_type dft_type, |
| 2043 | + int intended_rank) |
| 2044 | +{ |
| 2045 | + if(!hipfftw_creation_options_are_well_defined(creation_options)) |
| 2046 | + throw std::invalid_argument( |
| 2047 | + "invalid creation_options for hipfftw_creation_options_to_string"); |
| 2048 | + if(creation_options == hipfftw_plan_creation_func::NONE) |
| 2049 | + return "none"; |
| 2050 | + if(creation_options == hipfftw_plan_creation_func::ANY) |
| 2051 | + return "any"; |
| 2052 | + if(std::find(hipfftw_plan_creation_func_candidates.begin(), |
| 2053 | + hipfftw_plan_creation_func_candidates.end(), |
| 2054 | + creation_options) |
| 2055 | + == hipfftw_plan_creation_func_candidates.end()) |
| 2056 | + { |
| 2057 | + // 2 or more qualifying candidates flagged in creation_options |
| 2058 | + std::string ret; |
| 2059 | + for(auto candidate : hipfftw_plan_creation_func_candidates) |
| 2060 | + { |
| 2061 | + if(creation_options & candidate) |
| 2062 | + { |
| 2063 | + if(!ret.empty()) |
| 2064 | + ret += "_or_"; |
| 2065 | + ret += hipfftw_creation_options_to_string(candidate, dft_type, intended_rank); |
| 2066 | + } |
| 2067 | + } |
| 2068 | + return ret; |
| 2069 | + } |
| 2070 | + // creation_options is one unique qualifying candidate |
| 2071 | + std::ostringstream ret; |
| 2072 | + const std::string real_or_empty_qualifier |
| 2073 | + = is_real(dft_type) ? (is_fwd(dft_type) ? "_r2c" : "_c2r") : ""; |
| 2074 | + switch(creation_options) |
| 2075 | + { |
| 2076 | + case hipfftw_plan_creation_func::PLAN_DFT_ND: |
| 2077 | + ret << "plan_dft" << real_or_empty_qualifier << "_" << (intended_rank < 0 ? "negative" : "") |
| 2078 | + << std::abs(intended_rank) << "d"; |
| 2079 | + break; |
| 2080 | + case hipfftw_plan_creation_func::PLAN_DFT: |
| 2081 | + ret << "plan_dft" << real_or_empty_qualifier; |
| 2082 | + break; |
| 2083 | + case hipfftw_plan_creation_func::PLAN_MANY: |
| 2084 | + ret << "plan_many_dft" << real_or_empty_qualifier; |
| 2085 | + break; |
| 2086 | + case hipfftw_plan_creation_func::PLAN_GURU: |
| 2087 | + ret << "plan_guru_dft" << real_or_empty_qualifier; |
| 2088 | + break; |
| 2089 | + case hipfftw_plan_creation_func::PLAN_GURU64: |
| 2090 | + ret << "plan_guru64_dft" << real_or_empty_qualifier; |
| 2091 | + break; |
| 2092 | + default: |
| 2093 | + throw std::runtime_error("hipfftw_creation_options_to_string: internal error encountered " |
| 2094 | + "(unexpected value for creation_options)"); |
| 2095 | + break; |
| 2096 | + } |
| 2097 | + return ret.str(); |
| 2098 | +} |
| 2099 | + |
| 2100 | +template < |
| 2101 | + fft_precision prec, |
| 2102 | + std::enable_if_t<prec == fft_precision_single || prec == fft_precision_double, bool> = true> |
| 2103 | +struct hipfftw_plan_bundle_t |
| 2104 | +{ |
| 2105 | +private: |
| 2106 | + const decltype(hipfftw_funcs<prec>::destroy_plan)& plan_destructor; |
| 2107 | + |
| 2108 | +public: |
| 2109 | + hipfftw_plan_t<prec> plan; |
| 2110 | + std::pair<void*, void*> creation_io; // not owned |
| 2111 | + hipfftw_plan_creation_func creation_func; |
| 2112 | + std::string plan_token; // <-- plan details, except for creation io data pointers |
| 2113 | + hipfftw_plan_bundle_t(decltype(plan_destructor) plan_destructor_func) |
| 2114 | + : plan_destructor(plan_destructor_func) |
| 2115 | + , plan(nullptr) |
| 2116 | + , creation_io({nullptr, nullptr}) |
| 2117 | + , creation_func(hipfftw_plan_creation_func::NONE) |
| 2118 | + , plan_token("") |
| 2119 | + { |
| 2120 | + } |
| 2121 | + ~hipfftw_plan_bundle_t() |
| 2122 | + { |
| 2123 | + // make sure the plan destructor may be used to avoid |
| 2124 | + // throwing from the hipfftw_plan_bundle_t destructor |
| 2125 | + if(plan_destructor.may_be_used()) |
| 2126 | + { |
| 2127 | + // should be stable even if plan == nullptr; |
| 2128 | + plan_destructor(plan); |
| 2129 | + } |
| 2130 | + else if(plan) |
| 2131 | + { |
| 2132 | + std::cerr << "WARNING: A " << (prec == fft_precision_single ? "single" : "double") |
| 2133 | + << "-precision plan was seemingly created but its destructor cannot be used " |
| 2134 | + << std::endl; |
| 2135 | + } |
| 2136 | + } |
| 2137 | + // disable copies and moves |
| 2138 | + hipfftw_plan_bundle_t(const hipfftw_plan_bundle_t&) = delete; |
| 2139 | + hipfftw_plan_bundle_t& operator=(const hipfftw_plan_bundle_t&) = delete; |
| 2140 | + hipfftw_plan_bundle_t(hipfftw_plan_bundle_t&&) = delete; |
| 2141 | + hipfftw_plan_bundle_t& operator=(hipfftw_plan_bundle_t&&) = delete; |
| 2142 | +}; |
| 2143 | + |
| 2144 | +static bool rank_is_valid_for_hipfftw(int r) |
| 2145 | +{ |
| 2146 | + return r > 0; |
| 2147 | +} |
| 2148 | +template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true> |
| 2149 | +static bool lengths_are_valid_for_hipfftw_as(const std::vector<ptrdiff_t> len, int intended_rank) |
| 2150 | +{ |
| 2151 | + if(!rank_is_valid_for_hipfftw(intended_rank)) |
| 2152 | + return false; // impossible to validate lengths for an invalid rank |
| 2153 | + // check that lengths are all strictly positive and representable with |
| 2154 | + // type T without data loss |
| 2155 | + return len.size() == intended_rank |
| 2156 | + && std::all_of(len.begin(), len.end(), [](const decltype(len)::value_type& val) { |
| 2157 | + return val > 0 && val <= std::numeric_limits<T>::max(); |
| 2158 | + }); |
| 2159 | +} |
| 2160 | +static bool sign_is_valid_for_hipfftw(int s, const fft_transform_type& dft_kind) |
| 2161 | +{ |
| 2162 | + if(is_real(dft_kind)) |
| 2163 | + return true; // sign is irrelevant for real transforms |
| 2164 | + return s == (is_fwd(dft_kind) ? FFTW_FORWARD : FFTW_BACKWARD); |
| 2165 | +} |
| 2166 | +static constexpr unsigned hipfftw_valid_flags_mask |
| 2167 | + = FFTW_WISDOM_ONLY | FFTW_MEASURE | FFTW_DESTROY_INPUT | FFTW_UNALIGNED | FFTW_CONSERVE_MEMORY |
| 2168 | + | FFTW_EXHAUSTIVE | FFTW_PRESERVE_INPUT | FFTW_PATIENT | FFTW_ESTIMATE; |
| 2169 | +static bool flags_are_valid_for_hipfftw(unsigned f) |
| 2170 | +{ |
| 2171 | + return (f & hipfftw_valid_flags_mask) == f; |
| 2172 | +} |
| 2173 | + |
| 2174 | +template < |
| 2175 | + fft_precision prec, |
| 2176 | + std::enable_if_t<prec == fft_precision_single || prec == fft_precision_double, bool> = true> |
| 2177 | +struct hipfftw_helper |
| 2178 | +{ |
| 2179 | +private: |
| 2180 | + // plan_bundle stores information about the latest plan possibly created by this |
| 2181 | + // object. A shard_ptr is used to make hipfftw_helper safe w.r.t. shallow |
| 2182 | + // copies (as required by gtest for parameterized tests). |
| 2183 | + // This member is also made mutable so we can release/create it even from a |
| 2184 | + // const-qualified objects (e.g., to release owned resources upon test completion, |
| 2185 | + // or to re-create the plan at execution if needed or found necessary) |
| 2186 | + mutable std::shared_ptr<hipfftw_plan_bundle_t<prec>> plan_bundle; |
| 2187 | + |
| 2188 | + fft_transform_type dft_kind; |
| 2189 | + int rank = 0; |
| 2190 | + std::vector<ptrdiff_t> lengths; |
| 2191 | + fft_result_placement plan_placement; |
| 2192 | + int sign = 0; |
| 2193 | + unsigned flags = std::numeric_limits<unsigned>::max(); |
| 2194 | + |
| 2195 | + template <typename T> |
| 2196 | + void reset_member_value(T& member, const T& new_value) |
| 2197 | + { |
| 2198 | + if(new_value != member) |
| 2199 | + { |
| 2200 | + member = new_value; |
| 2201 | + plan_bundle.reset(); |
| 2202 | + } |
| 2203 | + } |
| 2204 | + |
| 2205 | + hipfftw_plan_creation_func get_creation_func(hipfftw_plan_creation_func creation_options) const |
| 2206 | + { |
| 2207 | + if(!hipfftw_creation_options_are_well_defined(creation_options)) |
| 2208 | + throw std::invalid_argument("invalid creation_options for get_creation_func"); |
| 2209 | + if(!can_use_creation_options(creation_options)) |
| 2210 | + { |
| 2211 | + // e.g., rank < 0 with creation_options == hipfftw_plan_creation_func::PLAN_DFT_ND |
| 2212 | + throw std::invalid_argument( |
| 2213 | + "The plan creation options " |
| 2214 | + + hipfftw_creation_options_to_string(creation_options, dft_kind, rank) |
| 2215 | + + " cannot be used with this object"); |
| 2216 | + } |
| 2217 | + std::vector<hipfftw_plan_creation_func> valid_candidates; |
| 2218 | + for(auto candidate : hipfftw_plan_creation_func_candidates) |
| 2219 | + { |
| 2220 | + if(!(creation_options & candidate)) |
| 2221 | + continue; // candidate is not in given creation_options |
| 2222 | + if(can_use_creation_options(candidate)) |
| 2223 | + { |
| 2224 | + // If creation_options != candidate for all candidates, creation_optionsactually |
| 2225 | + // combines 2 or more candidates --> only the candidates actually supporting plan |
| 2226 | + // creation will be considered "valid". If there exists one (usable) candidate s.t. |
| 2227 | + // creation_options == candidate however, this choice is considered "enforced" |
| 2228 | + // (e.g. for function-specific argument validation testing purposes) |
| 2229 | + if(creation_options == candidate || can_create_plan_with(candidate)) |
| 2230 | + valid_candidates.push_back(candidate); |
| 2231 | + } |
| 2232 | + } |
| 2233 | + if(valid_candidates.empty()) |
| 2234 | + return hipfftw_plan_creation_func::NONE; |
| 2235 | + // "randomly" (yet reproducibly) choose |
| 2236 | + return valid_candidates[std::hash<std::string>()(token()) % valid_candidates.size()]; |
| 2237 | + } |
| 2238 | + |
| 2239 | + template <bool make_reference_plan = false> |
| 2240 | + hipfftw_plan_t<prec> |
| 2241 | + make_plan(void* in, void* out, hipfftw_plan_creation_func chosen_creation) const |
| 2242 | + { |
| 2243 | + if(std::find(hipfftw_plan_creation_func_candidates.begin(), |
| 2244 | + hipfftw_plan_creation_func_candidates.end(), |
| 2245 | + chosen_creation) |
| 2246 | + == hipfftw_plan_creation_func_candidates.end()) |
| 2247 | + { |
| 2248 | + throw std::invalid_argument("Invalid chosen_creation for hipfftw_helper::make_plan"); |
| 2249 | + } |
| 2250 | + |
| 2251 | + // fetch/infer plan creation function arguments |
| 2252 | + const auto& hipfftw_impl = hipfftw_funcs<prec>::get_instance(); |
| 2253 | + const auto int_len = get_length_as<int>(); |
| 2254 | + const int* int_len_ptr = int_len.empty() ? nullptr : int_len.data(); |
| 2255 | + |
| 2256 | + switch(chosen_creation) |
| 2257 | + { |
| 2258 | + case hipfftw_plan_creation_func::PLAN_DFT_ND: |
| 2259 | + { |
| 2260 | + if(!can_use_creation_options(hipfftw_plan_creation_func::PLAN_DFT_ND)) |
| 2261 | + throw std::runtime_error("hipfftw_plan_creation_func::PLAN_DFT_ND cannot be used."); |
| 2262 | + if(rank == 1) |
| 2263 | + { |
| 2264 | + if(dft_kind == fft_transform_type_real_forward) |
| 2265 | + { |
| 2266 | + return hipfftw_impl.plan_dft_r2c_1d.template call<make_reference_plan>( |
| 2267 | + int_len_ptr[0], |
| 2268 | + static_cast<hipfftw_real_t<prec>*>(in), |
| 2269 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2270 | + flags); |
| 2271 | + } |
| 2272 | + else if(dft_kind == fft_transform_type_real_inverse) |
| 2273 | + { |
| 2274 | + return hipfftw_impl.plan_dft_c2r_1d.template call<make_reference_plan>( |
| 2275 | + int_len_ptr[0], |
| 2276 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2277 | + static_cast<hipfftw_real_t<prec>*>(out), |
| 2278 | + flags); |
| 2279 | + } |
| 2280 | + else |
| 2281 | + { |
| 2282 | + |
| 2283 | + return hipfftw_impl.plan_dft_1d.template call<make_reference_plan>( |
| 2284 | + int_len_ptr[0], |
| 2285 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2286 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2287 | + sign, |
| 2288 | + flags); |
| 2289 | + } |
| 2290 | + } |
| 2291 | + else if(rank == 2) |
| 2292 | + { |
| 2293 | + if(dft_kind == fft_transform_type_real_forward) |
| 2294 | + { |
| 2295 | + return hipfftw_impl.plan_dft_r2c_2d.template call<make_reference_plan>( |
| 2296 | + int_len_ptr[0], |
| 2297 | + int_len_ptr[1], |
| 2298 | + static_cast<hipfftw_real_t<prec>*>(in), |
| 2299 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2300 | + flags); |
| 2301 | + } |
| 2302 | + else if(dft_kind == fft_transform_type_real_inverse) |
| 2303 | + { |
| 2304 | + |
| 2305 | + return hipfftw_impl.plan_dft_c2r_2d.template call<make_reference_plan>( |
| 2306 | + int_len_ptr[0], |
| 2307 | + int_len_ptr[1], |
| 2308 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2309 | + static_cast<hipfftw_real_t<prec>*>(out), |
| 2310 | + flags); |
| 2311 | + } |
| 2312 | + else |
| 2313 | + { |
| 2314 | + return hipfftw_impl.plan_dft_2d.template call<make_reference_plan>( |
| 2315 | + int_len_ptr[0], |
| 2316 | + int_len_ptr[1], |
| 2317 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2318 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2319 | + sign, |
| 2320 | + flags); |
| 2321 | + } |
| 2322 | + } |
| 2323 | + else |
| 2324 | + { |
| 2325 | + if(dft_kind == fft_transform_type_real_forward) |
| 2326 | + { |
| 2327 | + return hipfftw_impl.plan_dft_r2c_3d.template call<make_reference_plan>( |
| 2328 | + int_len_ptr[0], |
| 2329 | + int_len_ptr[1], |
| 2330 | + int_len_ptr[2], |
| 2331 | + static_cast<hipfftw_real_t<prec>*>(in), |
| 2332 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2333 | + flags); |
| 2334 | + } |
| 2335 | + else if(dft_kind == fft_transform_type_real_inverse) |
| 2336 | + { |
| 2337 | + return hipfftw_impl.plan_dft_c2r_3d.template call<make_reference_plan>( |
| 2338 | + int_len_ptr[0], |
| 2339 | + int_len_ptr[1], |
| 2340 | + int_len_ptr[2], |
| 2341 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2342 | + static_cast<hipfftw_real_t<prec>*>(out), |
| 2343 | + flags); |
| 2344 | + } |
| 2345 | + else |
| 2346 | + { |
| 2347 | + return hipfftw_impl.plan_dft_3d.template call<make_reference_plan>( |
| 2348 | + int_len_ptr[0], |
| 2349 | + int_len_ptr[1], |
| 2350 | + int_len_ptr[2], |
| 2351 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2352 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2353 | + sign, |
| 2354 | + flags); |
| 2355 | + } |
| 2356 | + } |
| 2357 | + } |
| 2358 | + break; |
| 2359 | + case hipfftw_plan_creation_func::PLAN_DFT: |
| 2360 | + { |
| 2361 | + if(!can_use_creation_options(hipfftw_plan_creation_func::PLAN_DFT)) |
| 2362 | + throw std::runtime_error("hipfftw_plan_creation_func::PLAN_DFT cannot be used."); |
| 2363 | + |
| 2364 | + if(dft_kind == fft_transform_type_real_forward) |
| 2365 | + { |
| 2366 | + return hipfftw_impl.plan_dft_r2c.template call<make_reference_plan>( |
| 2367 | + rank, |
| 2368 | + int_len_ptr, |
| 2369 | + static_cast<hipfftw_real_t<prec>*>(in), |
| 2370 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2371 | + flags); |
| 2372 | + } |
| 2373 | + else if(dft_kind == fft_transform_type_real_inverse) |
| 2374 | + { |
| 2375 | + return hipfftw_impl.plan_dft_c2r.template call<make_reference_plan>( |
| 2376 | + rank, |
| 2377 | + int_len_ptr, |
| 2378 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2379 | + static_cast<hipfftw_real_t<prec>*>(out), |
| 2380 | + flags); |
| 2381 | + } |
| 2382 | + else |
| 2383 | + { |
| 2384 | + return hipfftw_impl.plan_dft.template call<make_reference_plan>( |
| 2385 | + rank, |
| 2386 | + int_len_ptr, |
| 2387 | + static_cast<hipfftw_complex_t<prec>*>(in), |
| 2388 | + static_cast<hipfftw_complex_t<prec>*>(out), |
| 2389 | + sign, |
| 2390 | + flags); |
| 2391 | + } |
| 2392 | + } |
| 2393 | + break; |
| 2394 | + case hipfftw_plan_creation_func::PLAN_MANY: |
| 2395 | + [[fallthrough]]; |
| 2396 | + case hipfftw_plan_creation_func::PLAN_GURU: |
| 2397 | + [[fallthrough]]; |
| 2398 | + case hipfftw_plan_creation_func::PLAN_GURU64: |
| 2399 | + throw std::runtime_error("Enforced plan creation is not implemented yet"); |
| 2400 | + break; |
| 2401 | + default: |
| 2402 | + throw std::runtime_error("Unknown kind of plan creation"); |
| 2403 | + break; |
| 2404 | + } |
| 2405 | + // unreachable |
| 2406 | + } |
| 2407 | + |
| 2408 | +public: |
| 2409 | + hipfftw_helper() = default; |
| 2410 | + ~hipfftw_helper() = default; |
| 2411 | + hipfftw_helper(hipfftw_helper&& other) = default; |
| 2412 | + hipfftw_helper& operator=(hipfftw_helper&& other) = default; |
| 2413 | + hipfftw_helper(const hipfftw_helper& other) = default; |
| 2414 | + hipfftw_helper& operator=(const hipfftw_helper& rhs) = default; |
| 2415 | + |
| 2416 | + void set_creation_args(fft_transform_type dft_kind_to_set, |
| 2417 | + int rank_to_set, |
| 2418 | + const std::vector<ptrdiff_t>& lengths_to_set, |
| 2419 | + fft_result_placement placement_to_set, |
| 2420 | + int sign_to_set, |
| 2421 | + unsigned flags_to_set) |
| 2422 | + { |
| 2423 | + reset_member_value(dft_kind, dft_kind_to_set); |
| 2424 | + reset_member_value(rank, rank_to_set); |
| 2425 | + reset_member_value(lengths, lengths_to_set); |
| 2426 | + reset_member_value(plan_placement, placement_to_set); |
| 2427 | + reset_member_value(sign, sign_to_set); |
| 2428 | + reset_member_value(flags, flags_to_set); |
| 2429 | + } |
| 2430 | + // getters |
| 2431 | + fft_transform_type get_dft_kind() const |
| 2432 | + { |
| 2433 | + return dft_kind; |
| 2434 | + } |
| 2435 | + int get_rank() const |
| 2436 | + { |
| 2437 | + return rank; |
| 2438 | + } |
| 2439 | + // returns the lengths as an std::vector<T> if they may all be safely converted to T |
| 2440 | + // (the returned vector is empty otherwise) |
| 2441 | + template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true> |
| 2442 | + std::vector<T> get_length_as() const |
| 2443 | + { |
| 2444 | + if constexpr(std::is_same_v<T, typename decltype(lengths)::value_type>) |
| 2445 | + return lengths; |
| 2446 | + std::vector<T> ret; |
| 2447 | + if(std::any_of(lengths.begin(), |
| 2448 | + lengths.end(), |
| 2449 | + [](const typename decltype(lengths)::value_type& val) { |
| 2450 | + return val < std::numeric_limits<T>::lowest() |
| 2451 | + || val > std::numeric_limits<T>::max(); |
| 2452 | + })) |
| 2453 | + { |
| 2454 | + // not a safe conversion, return empty lengths |
| 2455 | + return ret; |
| 2456 | + } |
| 2457 | + ret.assign(lengths.begin(), lengths.end()); |
| 2458 | + return ret; |
| 2459 | + } |
| 2460 | + fft_result_placement get_placement() const |
| 2461 | + { |
| 2462 | + return plan_placement; |
| 2463 | + } |
| 2464 | + int get_sign() const |
| 2465 | + { |
| 2466 | + return sign; |
| 2467 | + } |
| 2468 | + unsigned get_flags() const |
| 2469 | + { |
| 2470 | + return flags; |
| 2471 | + } |
| 2472 | + std::shared_ptr<hipfftw_plan_bundle_t<prec>> get_plan_bundle() const |
| 2473 | + { |
| 2474 | + return plan_bundle; |
| 2475 | + } |
| 2476 | + template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true> |
| 2477 | + std::vector<T> get_strides_as(fft_io io) const |
| 2478 | + { |
| 2479 | + if(!rank_is_valid_for_hipfftw(rank) || !has_valid_lengths()) |
| 2480 | + throw std::runtime_error( |
| 2481 | + "cannot calculate default strides with invalid rank or invalid lengths"); |
| 2482 | + // only default strides for now |
| 2483 | + std::vector<ptrdiff_t> strides(rank, 1); |
| 2484 | + if(rank > 1) |
| 2485 | + { |
| 2486 | + if(is_complex(dft_kind)) |
| 2487 | + strides[rank - 2] = lengths.back(); |
| 2488 | + else |
| 2489 | + { |
| 2490 | + if(is_fwd(dft_kind) == (io == fft_io::fft_io_out)) |
| 2491 | + strides[rank - 2] = lengths.back() / 2 + 1; |
| 2492 | + else |
| 2493 | + { |
| 2494 | + if(plan_placement == fft_placement_inplace) |
| 2495 | + strides[rank - 2] = 2 * (lengths.back() / 2 + 1); |
| 2496 | + else |
| 2497 | + strides[rank - 2] = lengths.back(); |
| 2498 | + } |
| 2499 | + } |
| 2500 | + } |
| 2501 | + for(auto dim = rank - 3; dim >= 0; dim--) |
| 2502 | + strides[dim] = strides[dim + 1] * lengths[dim + 1]; |
| 2503 | + |
| 2504 | + std::vector<T> ret; |
| 2505 | + if(std::any_of(strides.begin(), |
| 2506 | + strides.end(), |
| 2507 | + [](const typename decltype(strides)::value_type& val) { |
| 2508 | + return val < std::numeric_limits<T>::lowest() |
| 2509 | + || val > std::numeric_limits<T>::max(); |
| 2510 | + })) |
| 2511 | + { |
| 2512 | + // not a safe conversion, return empty lengths |
| 2513 | + return ret; |
| 2514 | + } |
| 2515 | + ret.assign(strides.begin(), strides.end()); |
| 2516 | + return ret; |
| 2517 | + } |
| 2518 | + template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true> |
| 2519 | + T get_dist_as(fft_io io) const |
| 2520 | + { |
| 2521 | + if(!rank_is_valid_for_hipfftw(rank) || !has_valid_lengths()) |
| 2522 | + throw std::runtime_error( |
| 2523 | + "cannot calculate default distance(s) with invalid rank or invalid lengths"); |
| 2524 | + // only default distances for now |
| 2525 | + ptrdiff_t dist = 0; |
| 2526 | + if(rank == 1) |
| 2527 | + { |
| 2528 | + if(is_complex(dft_kind)) |
| 2529 | + dist = lengths.back(); |
| 2530 | + else |
| 2531 | + { |
| 2532 | + if(is_fwd(dft_kind) == (io == fft_io::fft_io_out)) |
| 2533 | + dist = lengths.back() / 2 + 1; |
| 2534 | + else |
| 2535 | + { |
| 2536 | + if(plan_placement == fft_placement_inplace) |
| 2537 | + dist = 2 * (lengths.back() / 2 + 1); |
| 2538 | + else |
| 2539 | + dist = lengths.back(); |
| 2540 | + } |
| 2541 | + } |
| 2542 | + } |
| 2543 | + else |
| 2544 | + { |
| 2545 | + const auto strides = get_strides_as<ptrdiff_t>(io); |
| 2546 | + dist = strides.front() * lengths.front(); |
| 2547 | + } |
| 2548 | + if(dist < std::numeric_limits<T>::lowest() || dist > std::numeric_limits<T>::max()) |
| 2549 | + throw std::runtime_error("distance cannot be safely converted to the desired type"); |
| 2550 | + return static_cast<T>(dist); |
| 2551 | + } |
| 2552 | + template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true> |
| 2553 | + T get_nbatch_as(fft_io io) const |
| 2554 | + { |
| 2555 | + // only unbatched for now |
| 2556 | + T ret = 1; |
| 2557 | + return ret; |
| 2558 | + } |
| 2559 | + |
| 2560 | + // validity checks |
| 2561 | + bool has_valid_rank() const |
| 2562 | + { |
| 2563 | + return rank_is_valid_for_hipfftw(rank); |
| 2564 | + } |
| 2565 | + bool has_valid_lengths() const |
| 2566 | + { |
| 2567 | + return lengths_are_valid_for_hipfftw_as<ptrdiff_t>(lengths, rank); |
| 2568 | + } |
| 2569 | + bool has_valid_sign() const |
| 2570 | + { |
| 2571 | + return sign_is_valid_for_hipfftw(sign, dft_kind); |
| 2572 | + } |
| 2573 | + bool has_valid_flags() const |
| 2574 | + { |
| 2575 | + return flags_are_valid_for_hipfftw(flags); |
| 2576 | + } |
| 2577 | + // checks if the current parameters can be used with (any of) the given option(s) of |
| 2578 | + // plan creation (NOT whether they're valid or not). For instance, one cannot possibly |
| 2579 | + // communicate rank > 3 with hipfftw_plan_creation_func::PLAN_DFT_ND, or communicate |
| 2580 | + // non-default strides with hipfftw_plan_creation_func::PLAN_DFT_ND or |
| 2581 | + // hipfftw_plan_creation_func::PLAN_DFT... |
| 2582 | + // TODO: expand logic when extra configuration parameters are added (e.g. batch sizes, |
| 2583 | + // strides, etc.) |
| 2584 | + bool can_use_creation_options(hipfftw_plan_creation_func creation_options) const |
| 2585 | + { |
| 2586 | + if(!hipfftw_creation_options_are_well_defined(creation_options)) |
| 2587 | + throw std::invalid_argument( |
| 2588 | + "ill-defined creation_options used in can_use_creation_options"); |
| 2589 | + if(creation_options == hipfftw_plan_creation_func::NONE) |
| 2590 | + return false; |
| 2591 | + if(std::find(hipfftw_plan_creation_func_candidates.begin(), |
| 2592 | + hipfftw_plan_creation_func_candidates.end(), |
| 2593 | + creation_options) |
| 2594 | + == hipfftw_plan_creation_func_candidates.end()) |
| 2595 | + { |
| 2596 | + // creation_options combines several candidates in hipfftw_plan_creation_func_candidates |
| 2597 | + // --> parse them individually and find out if any applicable can be used |
| 2598 | + return std::any_of(hipfftw_plan_creation_func_candidates.begin(), |
| 2599 | + hipfftw_plan_creation_func_candidates.end(), |
| 2600 | + [=](const hipfftw_plan_creation_func& candidate) { |
| 2601 | + return (creation_options & candidate) |
| 2602 | + && can_use_creation_options(candidate); |
| 2603 | + }); |
| 2604 | + } |
| 2605 | + // "creation_options" actually is an individual value in hipfftw_plan_creation_func_candidates |
| 2606 | + switch(creation_options) |
| 2607 | + { |
| 2608 | + case hipfftw_plan_creation_func::PLAN_DFT_ND: |
| 2609 | + // rank is not passed as an argument but dictated by the called function, |
| 2610 | + // (must be 1, 2, or 3), and as many lengths must be passed as individual |
| 2611 | + // integer values |
| 2612 | + return (rank == 1 || rank == 2 || rank == 3) && get_length_as<int>().size() == rank; |
| 2613 | + break; |
| 2614 | + case hipfftw_plan_creation_func::PLAN_DFT: |
| 2615 | + // the lengths must be representable as integers, if not empty (supposedly |
| 2616 | + // intentionally, e.g., for input validation testing purposes) |
| 2617 | + return lengths.empty() || get_length_as<int>().size() == rank; |
| 2618 | + break; |
| 2619 | + case hipfftw_plan_creation_func::PLAN_MANY: |
| 2620 | + [[fallthrough]]; |
| 2621 | + case hipfftw_plan_creation_func::PLAN_GURU: |
| 2622 | + [[fallthrough]]; |
| 2623 | + case hipfftw_plan_creation_func::PLAN_GURU64: |
| 2624 | + return false; |
| 2625 | + break; |
| 2626 | + default: |
| 2627 | + throw std::runtime_error("hipfftw_helper: internal error encountered (unexpected value " |
| 2628 | + "for creation_options)"); |
| 2629 | + break; |
| 2630 | + } |
| 2631 | + // unreachable |
| 2632 | + } |
| 2633 | + |
| 2634 | + // checks validity of configuration parameters and whether creation can be |
| 2635 | + // attempted via (any of) the given option(s) |
| 2636 | + bool is_valid_for_creation_with(hipfftw_plan_creation_func creation_options) const |
| 2637 | + { |
| 2638 | + if(!hipfftw_creation_options_are_well_defined(creation_options)) |
| 2639 | + throw std::invalid_argument("invalid creation_options for is_valid_for_creation_with"); |
| 2640 | + |
| 2641 | + // TODO: expand the global validity checks below when this struct is |
| 2642 | + // expanded to cover more configurations (e.g., batching, srides, etc.) |
| 2643 | + return has_valid_rank() && has_valid_lengths() && has_valid_sign() && has_valid_flags() |
| 2644 | + && can_use_creation_options(creation_options); |
| 2645 | + } |
| 2646 | + bool is_valid_for_creation() const |
| 2647 | + { |
| 2648 | + return is_valid_for_creation_with(hipfftw_plan_creation_func::ANY); |
| 2649 | + } |
| 2650 | + // check expected support by (any of) the given option(s) |
| 2651 | + bool has_unsupported_args_for(hipfftw_plan_creation_func creation_options) const |
| 2652 | + { |
| 2653 | + // extra conditions for configurations supported by hipfftw: |
| 2654 | + if(rank > 3) |
| 2655 | + return true; |
| 2656 | + if(flags & FFTW_WISDOM_ONLY) |
| 2657 | + return true; |
| 2658 | + if(dft_kind == fft_transform_type_real_inverse && rank > 1 && (flags & FFTW_PRESERVE_INPUT)) |
| 2659 | + return true; |
| 2660 | + if(!(creation_options & hipfftw_plan_creation_func::PLAN_GURU64) && has_valid_rank() |
| 2661 | + && has_valid_lengths()) |
| 2662 | + { |
| 2663 | + // cannot handle data sizes involving more elements than the |
| 2664 | + // largest representable int value |
| 2665 | + if(get_num_elements_in(fft_io_in) > std::numeric_limits<int>::max() |
| 2666 | + || get_num_elements_in(fft_io_out) > std::numeric_limits<int>::max()) |
| 2667 | + return true; |
| 2668 | + } |
| 2669 | + return false; |
| 2670 | + } |
| 2671 | + bool can_create_plan_with(hipfftw_plan_creation_func creation_options) const |
| 2672 | + { |
| 2673 | + if(!hipfftw_creation_options_are_well_defined(creation_options)) |
| 2674 | + throw std::invalid_argument("invalid creation_option for can_create_plan_with"); |
| 2675 | + |
| 2676 | + if(!is_valid_for_creation_with(creation_options)) |
| 2677 | + return false; |
| 2678 | + if(has_unsupported_args_for(creation_options)) |
| 2679 | + return false; |
| 2680 | + return true; |
| 2681 | + } |
| 2682 | + bool can_create_plan() const |
| 2683 | + { |
| 2684 | + return can_create_plan_with(hipfftw_plan_creation_func::ANY); |
| 2685 | + } |
| 2686 | + // create a token consistent with other tests to enable kernel precompilation |
| 2687 | + // for valid cases, and/or capturing all required details about members otherwise |
| 2688 | + std::string token() const |
| 2689 | + { |
| 2690 | + std::ostringstream ret; |
| 2691 | + switch(dft_kind) |
| 2692 | + { |
| 2693 | + case fft_transform_type_complex_forward: |
| 2694 | + ret << "complex_forward"; |
| 2695 | + break; |
| 2696 | + case fft_transform_type_complex_inverse: |
| 2697 | + ret << "complex_inverse"; |
| 2698 | + break; |
| 2699 | + case fft_transform_type_real_forward: |
| 2700 | + ret << "real_forward"; |
| 2701 | + break; |
| 2702 | + case fft_transform_type_real_inverse: |
| 2703 | + ret << "real_inverse"; |
| 2704 | + break; |
| 2705 | + default: |
| 2706 | + throw std::runtime_error("unknown type of transform"); |
| 2707 | + } |
| 2708 | + |
| 2709 | + // report rank if invalid |
| 2710 | + if(!has_valid_rank() || lengths.empty()) |
| 2711 | + ret << "_invalid_rank" << (rank < 0 ? "_negative_" : "_") << std::abs(rank); |
| 2712 | + ret << "_len"; |
| 2713 | + if(lengths.empty()) |
| 2714 | + ret << "_none"; |
| 2715 | + else |
| 2716 | + { |
| 2717 | + for(const auto& len : lengths) |
| 2718 | + ret << (len < 0 ? "_negative_" : "_") << std::abs(len); |
| 2719 | + } |
| 2720 | + if constexpr(prec == fft_precision_single) |
| 2721 | + ret << "_single"; |
| 2722 | + else |
| 2723 | + ret << "_double"; |
| 2724 | + ret << (plan_placement == fft_placement_inplace ? "_ip" : "_op"); |
| 2725 | + // only supporting unbatched cases as of now |
| 2726 | + ret << "_batch_1"; |
| 2727 | + if(has_valid_rank() && has_valid_lengths()) |
| 2728 | + { |
| 2729 | + ret << "_istride"; |
| 2730 | + for(const auto& stride : get_strides_as<size_t>(fft_io::fft_io_in)) |
| 2731 | + ret << "_" << stride; |
| 2732 | + if(!is_real(dft_kind)) |
| 2733 | + ret << "_CI"; |
| 2734 | + else if(dft_kind == fft_transform_type_real_forward) |
| 2735 | + ret << "_R"; |
| 2736 | + else |
| 2737 | + ret << "_HI"; |
| 2738 | + ret << "_ostride"; |
| 2739 | + for(const auto& stride : get_strides_as<size_t>(fft_io::fft_io_out)) |
| 2740 | + ret << "_" << stride; |
| 2741 | + if(!is_real(dft_kind)) |
| 2742 | + ret << "_CI"; |
| 2743 | + else if(dft_kind == fft_transform_type_real_forward) |
| 2744 | + ret << "_HI"; |
| 2745 | + else |
| 2746 | + ret << "_R"; |
| 2747 | + ret << "_idist_" << get_dist_as<size_t>(fft_io::fft_io_in); |
| 2748 | + ret << "_odist_" << get_dist_as<size_t>(fft_io::fft_io_out); |
| 2749 | + ret << "_ioffset_0_0_ooffset_0_0"; |
| 2750 | + } |
| 2751 | + if(!has_valid_sign()) |
| 2752 | + ret << "_invalid_sign" << (sign < 0 ? "_negative_" : "_") << std::abs(sign); |
| 2753 | + ret << "_flags_" << flags; |
| 2754 | + return ret.str(); |
| 2755 | + } |
| 2756 | + // create_plan invokes an hipfftw plan creation function for the object's configuration |
| 2757 | + // parameters, the corresponding plan pointer returned by hipfftw is stored internally. |
| 2758 | + // IMPORTANT NOTE: if one wants to target a specific creation function (as represented |
| 2759 | + // by any value in hipfftw_plan_creation_func_candidates), setting the creation_options |
| 2760 | + // argument to that specific value effectively bypasses the verification that the |
| 2761 | + // object's configuration is actually (expected to be) supported and attempts the plan |
| 2762 | + // creation anyways (unless it simply cannot be done, e.g., attempting |
| 2763 | + // creation_options = hipfftw_plan_creation_func::PLAN_DFT_ND herein on an object |
| 2764 | + // holding a value for rank > 3 simply cannot be done) |
| 2765 | + void create_plan(void* in, |
| 2766 | + void* out, |
| 2767 | + hipfftw_plan_creation_func creation_options |
| 2768 | + = hipfftw_plan_creation_func::ANY) const |
| 2769 | + { |
| 2770 | + const auto& hipfftw_impl = hipfftw_funcs<prec>::get_instance(); |
| 2771 | + const hipfftw_plan_creation_func chosen_option = get_creation_func(creation_options); |
| 2772 | + if(chosen_option == hipfftw_plan_creation_func::NONE) |
| 2773 | + { |
| 2774 | + plan_bundle = std::make_shared<hipfftw_plan_bundle_t<prec>>(hipfftw_impl.destroy_plan); |
| 2775 | + plan_bundle->creation_io = {in, out}; |
| 2776 | + plan_bundle->plan = nullptr; |
| 2777 | + plan_bundle->creation_func = chosen_option; |
| 2778 | + plan_bundle->plan_token = ""; |
| 2779 | + return; |
| 2780 | + } |
| 2781 | + // early return if there is no need to (re)build |
| 2782 | + if(plan_bundle && plan_bundle->plan_token == token() && plan_bundle->creation_io.first == in |
| 2783 | + && plan_bundle->creation_io.second == out && plan_bundle->creation_func == chosen_option) |
| 2784 | + return; |
| 2785 | + |
| 2786 | + // create the desired plan |
| 2787 | + plan_bundle = std::make_shared<hipfftw_plan_bundle_t<prec>>(hipfftw_impl.destroy_plan); |
| 2788 | + plan_bundle->plan = make_plan(in, out, chosen_option); |
| 2789 | + plan_bundle->creation_io = {in, out}; |
| 2790 | + plan_bundle->creation_func = chosen_option; |
| 2791 | + plan_bundle->plan_token = token(); |
| 2792 | + } |
| 2793 | + |
| 2794 | + // returns a reference FFTW plan for the current configuration |
| 2795 | + // The returned plan is NOT owned by this object! |
| 2796 | + hipfftw_plan_t<prec> get_reference_plan(void* in, |
| 2797 | + void* out, |
| 2798 | + hipfftw_plan_creation_func creation_options |
| 2799 | + = hipfftw_plan_creation_func::ANY) const |
| 2800 | + { |
| 2801 | + const hipfftw_plan_creation_func chosen_option = get_creation_func(creation_options); |
| 2802 | + if(chosen_option == hipfftw_plan_creation_func::NONE) |
| 2803 | + { |
| 2804 | + return nullptr; |
| 2805 | + } |
| 2806 | + constexpr bool make_reference_plan = true; |
| 2807 | + return make_plan<make_reference_plan>(in, out, chosen_option); |
| 2808 | + } |
| 2809 | + |
| 2810 | + void execute(void* execute_in, void* execute_out) const |
| 2811 | + { |
| 2812 | + if(!plan_bundle || plan_bundle->plan_token != token()) |
| 2813 | + { |
| 2814 | + // plan is not created or possibly not up-to-date |
| 2815 | + create_plan(execute_in, execute_out); |
| 2816 | + } |
| 2817 | + |
| 2818 | + const auto& hipfftw_impl = hipfftw_funcs<prec>::get_instance(); |
| 2819 | + if(execute_in == plan_bundle->creation_io.first |
| 2820 | + && execute_out == plan_bundle->creation_io.second) |
| 2821 | + { |
| 2822 | + hipfftw_impl.execute(plan_bundle->plan); |
| 2823 | + } |
| 2824 | + else |
| 2825 | + { |
| 2826 | + throw std::runtime_error("New-array execution functions not implemented yet."); |
| 2827 | + } |
| 2828 | + } |
| 2829 | + |
| 2830 | + // TODO: revise/expand logic below when the structure is expanded for more cases (batches, |
| 2831 | + // non-default strides, etc.) |
| 2832 | + size_t get_num_elements_in(fft_io in_or_out) const |
| 2833 | + { |
| 2834 | + if(in_or_out != fft_io_in && in_or_out != fft_io_out) |
| 2835 | + throw std::invalid_argument("invalid in_or_out for get_num_elements_in"); |
| 2836 | + if(!has_valid_rank() || !has_valid_lengths()) |
| 2837 | + throw std::runtime_error("get_num_elements_in requires valid rank and lengths"); |
| 2838 | + const auto tmp = get_length_as<size_t>(); |
| 2839 | + if(tmp.empty() || tmp.size() != rank) |
| 2840 | + { |
| 2841 | + throw std::runtime_error( |
| 2842 | + "get_num_elements_in failed to correctly convert lengths to size_t values"); |
| 2843 | + } |
| 2844 | + size_t num_elems = 1; |
| 2845 | + if(is_complex(dft_kind)) |
| 2846 | + { |
| 2847 | + num_elems *= tmp[rank - 1]; |
| 2848 | + } |
| 2849 | + else |
| 2850 | + { |
| 2851 | + const size_t cmplx_len = tmp[rank - 1] / 2 + 1; |
| 2852 | + if(is_fwd(dft_kind) == (in_or_out == fft_io_out)) |
| 2853 | + num_elems *= cmplx_len; |
| 2854 | + else |
| 2855 | + num_elems |
| 2856 | + *= plan_placement == fft_placement_inplace ? 2 * cmplx_len : tmp[rank - 1]; |
| 2857 | + } |
| 2858 | + num_elems *= product(tmp.begin(), tmp.begin() + rank - 1); |
| 2859 | + return num_elems; |
| 2860 | + } |
| 2861 | + |
| 2862 | + size_t get_data_byte_size(fft_io in_or_out) const |
| 2863 | + { |
| 2864 | + if(in_or_out != fft_io_in && in_or_out != fft_io_out) |
| 2865 | + throw std::invalid_argument("invalid in_or_out for get_data_byte_size"); |
| 2866 | + // for in-place, input and output data sizes are enforced equal |
| 2867 | + std::vector<fft_io> io_range_to_consider = {in_or_out}; |
| 2868 | + if(plan_placement == fft_placement_inplace) |
| 2869 | + io_range_to_consider.push_back(in_or_out == fft_io::fft_io_in ? fft_io::fft_io_out |
| 2870 | + : fft_io::fft_io_in); |
| 2871 | + |
| 2872 | + size_t ret = 0; |
| 2873 | + for(auto io : io_range_to_consider) |
| 2874 | + { |
| 2875 | + const size_t num_elems = get_num_elements_in(io); |
| 2876 | + if(is_complex(dft_kind) || (is_fwd(dft_kind) == (io == fft_io_out))) |
| 2877 | + ret = std::max(ret, num_elems * sizeof(hipfftw_complex_t<prec>)); |
| 2878 | + else |
| 2879 | + ret = std::max(ret, num_elems * sizeof(hipfftw_real_t<prec>)); |
| 2880 | + } |
| 2881 | + return ret; |
| 2882 | + } |
| 2883 | + void release_plan() const |
| 2884 | + { |
| 2885 | + plan_bundle.reset(); |
| 2886 | + } |
| 2887 | +}; |
| 2888 | + |
| 2889 | +#endif |
| 2890 | diff --git a/clients/samples/CMakeLists.txt b/clients/samples/CMakeLists.txt |
| 2891 | index 488e09d..37cc924 100644 |
| 2892 | --- a/clients/samples/CMakeLists.txt |
| 2893 | +++ b/clients/samples/CMakeLists.txt |
| 2894 | @@ -124,8 +124,15 @@ foreach( sample ${sample_list} ) |
| 2895 | |
| 2896 | endforeach() |
| 2897 | |
| 2898 | -# cuFFT callback code must be compiled with -dc to enable relocatable |
| 2899 | -# device code |
| 2900 | -if( BUILD_WITH_LIB STREQUAL "CUDA" AND hipfft_callback IN_LIST sample_list ) |
| 2901 | - target_compile_options( hipfft_callback PRIVATE -dc ) |
| 2902 | +# callback code must be compiled as relocatable device code |
| 2903 | +if( hipfft_callback IN_LIST sample_list ) |
| 2904 | + if( BUILD_WITH_LIB STREQUAL "CUDA" ) |
| 2905 | + target_compile_options( hipfft_callback PRIVATE -dc ) |
| 2906 | + else() |
| 2907 | + # -fgpu-rdc causes failure at link stage on Windows |
| 2908 | + if (NOT WIN32) |
| 2909 | + target_compile_options( hipfft_callback PRIVATE -fgpu-rdc ) |
| 2910 | + target_link_options( hipfft_callback PRIVATE -fgpu-rdc ) |
| 2911 | + endif() |
| 2912 | + endif() |
| 2913 | endif() |
| 2914 | diff --git a/clients/samples/hipfft_callback.cpp b/clients/samples/hipfft_callback.cpp |
| 2915 | index b365e87..8a77266 100644 |
| 2916 | --- a/clients/samples/hipfft_callback.cpp |
| 2917 | +++ b/clients/samples/hipfft_callback.cpp |
| 2918 | @@ -20,6 +20,7 @@ |
| 2919 | // THE SOFTWARE. |
| 2920 | |
| 2921 | #include <iostream> |
| 2922 | +#ifndef WIN32 |
| 2923 | #include <vector> |
| 2924 | |
| 2925 | #include <hip/hip_runtime.h> |
| 2926 | @@ -48,9 +49,15 @@ __device__ hipfftDoubleComplex load_callback(hipfftDoubleComplex* input, |
| 2927 | } |
| 2928 | |
| 2929 | __device__ auto load_callback_dev = load_callback; |
| 2930 | +#endif |
| 2931 | |
| 2932 | int main() |
| 2933 | { |
| 2934 | +#ifdef WIN32 |
| 2935 | + std::cout << "This sample is temporarily disabled on Windows" << std::endl; |
| 2936 | + return EXIT_SUCCESS; |
| 2937 | +#else |
| 2938 | + |
| 2939 | std::cout << "hipfft 1D double-precision complex-to-complex transform with callback\n"; |
| 2940 | |
| 2941 | const int Nx = 8; |
| 2942 | @@ -156,4 +163,5 @@ int main() |
| 2943 | throw std::runtime_error("hipFree failed"); |
| 2944 | |
| 2945 | return 0; |
| 2946 | +#endif |
| 2947 | } |
| 2948 | diff --git a/clients/tests/CMakeLists.txt b/clients/tests/CMakeLists.txt |
| 2949 | index 28c6594..91024cf 100644 |
| 2950 | --- a/clients/tests/CMakeLists.txt |
| 2951 | +++ b/clients/tests/CMakeLists.txt |
| 2952 | @@ -32,6 +32,13 @@ else( ) |
| 2953 | "Install path prefix, prepended onto install directories" ) |
| 2954 | endif( ) |
| 2955 | |
| 2956 | +# Dependencies |
| 2957 | + |
| 2958 | +find_package( ROCmCMakeBuildTools REQUIRED CONFIG PATHS /opt/rocm ) |
| 2959 | +include(ROCMInstallTargets) |
| 2960 | +list( APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/../cmake ) |
| 2961 | + |
| 2962 | + |
| 2963 | # This has to be initialized before the project() command appears |
| 2964 | # Set the default of CMAKE_BUILD_TYPE to be release, unless user |
| 2965 | # specifies with -D. MSVC_IDE does not use CMAKE_BUILD_TYPE |
| 2966 | @@ -42,13 +49,19 @@ endif() |
| 2967 | |
| 2968 | project( hipfft-clients-tests LANGUAGES CXX ) |
| 2969 | |
| 2970 | +if( NOT HIPFFT_BUILD_SCOPE ) |
| 2971 | + find_package( hipfft REQUIRED CONFIG PATHS ) |
| 2972 | +endif() |
| 2973 | |
| 2974 | find_package( Boost REQUIRED) |
| 2975 | |
| 2976 | set( Boost_USE_STATIC_LIBS OFF ) |
| 2977 | |
| 2978 | + |
| 2979 | find_package( FFTW 3.0 REQUIRED MODULE COMPONENTS FLOAT DOUBLE ) |
| 2980 | |
| 2981 | +set( BUILD_WITH_LIB "ROCM" CACHE STRING "Build ${PROJECT_NAME} with ROCM or CUDA libraries" ) |
| 2982 | + |
| 2983 | set( THREADS_PREFER_PTHREAD_FLAG ON ) |
| 2984 | find_package( Threads REQUIRED ) |
| 2985 | |
| 2986 | @@ -60,7 +73,9 @@ set( hipfft-test_source |
| 2987 | accuracy_test_2D.cpp |
| 2988 | accuracy_test_3D.cpp |
| 2989 | accuracy_test_callback.cpp |
| 2990 | + hipfftw_test.cpp |
| 2991 | multi_device_test.cpp |
| 2992 | + multi_stream_test.cpp |
| 2993 | ../../shared/array_validator.cpp |
| 2994 | ) |
| 2995 | |
| 2996 | @@ -84,7 +99,7 @@ if( NOT BUILD_WITH_LIB STREQUAL "CUDA" ) |
| 2997 | if( WIN32 ) |
| 2998 | find_package( HIP CONFIG REQUIRED ) |
| 2999 | else() |
| 3000 | - find_package( HIP MODULE REQUIRED ) |
| 3001 | + find_package( hip REQUIRED CONFIG PATHS /opt/rocm/lib/cmake/hip/ ) |
| 3002 | endif() |
| 3003 | endif() |
| 3004 | |
| 3005 | @@ -98,6 +113,20 @@ endif() |
| 3006 | string( CONCAT TESTS_OUT_DIR "${PROJECT_BINARY_DIR}" ${TESTS_OUT_DIR} ) |
| 3007 | |
| 3008 | option( BUILD_CLIENTS_TESTS_OPENMP "Build tests with OpenMP" ON ) |
| 3009 | +if( BUILD_CLIENTS_TESTS_OPENMP AND NOT BUILD_WITH_LIB STREQUAL "CUDA" ) |
| 3010 | + # Attempt to find a config version, which provides openmp_LIB_DIR. |
| 3011 | + find_package( OpenMP CONFIG PATHS "${HIP_CLANG_ROOT}/lib/cmake" ) |
| 3012 | + if( NOT OPENMP_FOUND OR NOT DEFINED ${openmp_LIB_DIR} ) |
| 3013 | + # Fall-back to module mode. |
| 3014 | + find_package( OpenMP REQUIRED ) |
| 3015 | + set( BUILD_RPATH "${HIP_CLANG_ROOT}/lib" ) |
| 3016 | + set( INSTALL_RPATH "$ORIGIN/../llvm/lib" ) |
| 3017 | + else() |
| 3018 | + set( BUILD_RPATH "${HIP_CLANG_ROOT}/${openmp_LIB_DIR}" ) |
| 3019 | + set( INSTALL_RPATH "$ORIGIN/../llvm/${openmp_LIB_DIR}" ) |
| 3020 | + |
| 3021 | + endif() |
| 3022 | +endif() |
| 3023 | |
| 3024 | foreach( target ${TEST_TARGETS} ) |
| 3025 | set_target_properties( ${target} PROPERTIES |
| 3026 | @@ -105,6 +134,15 @@ foreach( target ${TEST_TARGETS} ) |
| 3027 | CXX_STANDARD_REQUIRED ON |
| 3028 | ) |
| 3029 | |
| 3030 | + if( BUILD_CLIENTS_TESTS_OPENMP ) |
| 3031 | + set_target_properties( ${TEST_TARGETS} PROPERTIES |
| 3032 | + BUILD_RPATH "${BUILD_RPATH}" |
| 3033 | + ) |
| 3034 | + set_target_properties( ${TEST_TARGETS} PROPERTIES |
| 3035 | + INSTALL_RPATH "${INSTALL_RPATH}" |
| 3036 | + ) |
| 3037 | + endif() |
| 3038 | + |
| 3039 | if( BUILD_WITH_LIB STREQUAL "ROCM" ) |
| 3040 | target_compile_options( ${target} PRIVATE ${WARNING_FLAGS} ) |
| 3041 | target_link_libraries( ${target} |
| 3042 | @@ -112,7 +150,7 @@ foreach( target ${TEST_TARGETS} ) |
| 3043 | hip::host |
| 3044 | hip::device |
| 3045 | ) |
| 3046 | - foreach( gpu_target ${AMDGPU_TARGETS} ) |
| 3047 | + foreach( gpu_target ${GPU_TARGETS} ) |
| 3048 | target_compile_options( ${target} PRIVATE --offload-arch=${gpu_target} ) |
| 3049 | endforeach() |
| 3050 | |
| 3051 | @@ -140,30 +178,12 @@ foreach( target ${TEST_TARGETS} ) |
| 3052 | target_compile_definitions( ${target} PUBLIC _CUFFT_BACKEND ) |
| 3053 | endif() |
| 3054 | |
| 3055 | - if( BUILD_CLIENTS_TESTS_OPENMP ) |
| 3056 | - if( BUILD_WITH_LIB STREQUAL "CUDA" ) |
| 3057 | - message( STATUS "OpenMP is not supported on CUDA, building tests without it" ) |
| 3058 | - else() |
| 3059 | - target_compile_options( ${target} PRIVATE -DBUILD_CLIENTS_TESTS_OPENMP ) |
| 3060 | - if(NOT (CMAKE_CXX_COMPILER MATCHES ".*hipcc$" OR CMAKE_CXX_COMPILER MATCHES ".*clang\\+\\+")) |
| 3061 | - target_compile_options( ${target} PRIVATE -fopenmp ) |
| 3062 | - target_link_libraries( ${target} PRIVATE -fopenmp -L${HIP_CLANG_ROOT}/lib -Wl,-rpath=${HIP_CLANG_ROOT}/lib ) |
| 3063 | - target_include_directories( ${target} PRIVATE ${HIP_CLANG_ROOT}/include ) |
| 3064 | - else() |
| 3065 | - if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang") |
| 3066 | - target_compile_options( ${target} PRIVATE -fopenmp=libomp ) |
| 3067 | - target_link_options( ${target} PRIVATE -fopenmp=libomp ) |
| 3068 | - endif() |
| 3069 | - endif() |
| 3070 | - endif() |
| 3071 | - endif() |
| 3072 | |
| 3073 | target_include_directories( ${target} |
| 3074 | PRIVATE |
| 3075 | $<BUILD_INTERFACE:${Boost_INCLUDE_DIRS}> |
| 3076 | $<BUILD_INTERFACE:${FFTW_INCLUDE_DIRS}> |
| 3077 | $<BUILD_INTERFACE:${hip_INCLUDE_DIRS}> |
| 3078 | - $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/../../library/include> |
| 3079 | ) |
| 3080 | |
| 3081 | target_link_libraries( ${target} |
| 3082 | @@ -171,6 +191,18 @@ foreach( target ${TEST_TARGETS} ) |
| 3083 | hip::hipfft |
| 3084 | ${FFTW_LIBRARIES} |
| 3085 | ) |
| 3086 | + |
| 3087 | + if( BUILD_CLIENTS_TESTS_OPENMP ) |
| 3088 | + if( BUILD_WITH_LIB STREQUAL "CUDA" ) |
| 3089 | + message( STATUS "OpenMP is not supported on CUDA, building tests without it" ) |
| 3090 | + else() |
| 3091 | + if( DEFINED ${openmp_LIB_DIR} ) |
| 3092 | + set_target_properties( ${target} PROPERTIES BUILD_RPATH "${HIP_CLANG_ROOT}/${openmp_LIB_DIR}" ) |
| 3093 | + set_target_properties( ${target} PROPERTIES INSTALL_RPATH "${HIP_CLANG_ROOT}/${openmp_LIB_DIR}" ) |
| 3094 | + endif() |
| 3095 | + target_link_libraries( ${target} PRIVATE OpenMP::OpenMP_CXX ) |
| 3096 | + endif() |
| 3097 | + endif() |
| 3098 | |
| 3099 | if( HIPFFT_MPI_ENABLE ) |
| 3100 | target_link_libraries( ${target} |
| 3101 | @@ -187,15 +219,29 @@ foreach( target ${TEST_TARGETS} ) |
| 3102 | rocm_install(TARGETS ${target} COMPONENT tests) |
| 3103 | endforeach() |
| 3104 | |
| 3105 | -if( GTEST_FOUND ) |
| 3106 | +find_package( GTest 1.11.0 ) |
| 3107 | + |
| 3108 | +if( GTest_FOUND ) |
| 3109 | target_include_directories( hipfft-test PRIVATE $<BUILD_INTERFACE:${GTEST_INCLUDE_DIRS}> ) |
| 3110 | - target_link_libraries( hipfft-test PRIVATE ${GTEST_LIBRARIES} ) |
| 3111 | + target_link_libraries( hipfft-test PRIVATE GTest::gtest ) |
| 3112 | else() |
| 3113 | + # gtest build by the hipFFT |
| 3114 | add_dependencies( hipfft-test gtest ) |
| 3115 | target_include_directories( hipfft-test PRIVATE hipfft-test_include_dirs ${GTEST_INCLUDE_DIRS} ) |
| 3116 | target_link_libraries( hipfft-test PRIVATE ${GTEST_LIBRARIES} ) |
| 3117 | endif() |
| 3118 | |
| 3119 | +# tests have callback functions, which need to be built as relocatable device code |
| 3120 | +if( BUILD_WITH_LIB STREQUAL "CUDA" ) |
| 3121 | + target_compile_options( hipfft-test PRIVATE -dc ) |
| 3122 | +else() |
| 3123 | + # -fgpu-rdc causes failure at link stage on Windows |
| 3124 | + if (NOT WIN32) |
| 3125 | + target_compile_options( hipfft-test PRIVATE -fgpu-rdc ) |
| 3126 | + target_link_options( hipfft-test PRIVATE -fgpu-rdc ) |
| 3127 | + endif() |
| 3128 | +endif() |
| 3129 | + |
| 3130 | if(FFTW_MULTITHREAD) |
| 3131 | target_compile_options( hipfft-test PRIVATE -DFFTW_MULTITHREAD ) |
| 3132 | endif( ) |
| 3133 | @@ -203,8 +249,14 @@ endif( ) |
| 3134 | target_link_libraries( hipfft-test |
| 3135 | PRIVATE |
| 3136 | Threads::Threads |
| 3137 | + ${CMAKE_DL_LIBS} |
| 3138 | ) |
| 3139 | |
| 3140 | +# hipfft-test will opens the hipfftw library but does not link to it |
| 3141 | +if( TARGET hipfftw ) |
| 3142 | + add_dependencies( hipfft-test hipfftw ) |
| 3143 | +endif() |
| 3144 | + |
| 3145 | if (WIN32) |
| 3146 | |
| 3147 | # Ensure tests run with HIP DLLs and not anything the driver owns |
| 3148 | @@ -222,3 +274,42 @@ if (WIN32) |
| 3149 | add_custom_command( TARGET hipfft-test POST_BUILD COMMAND ${CMAKE_COMMAND} ARGS -E copy ${file_i} $<TARGET_FILE_DIR:hipfft-test> ) |
| 3150 | endforeach( file_i ) |
| 3151 | endif() |
| 3152 | + |
| 3153 | +option(BUILD_CODE_COVERAGE "Build with code coverage flags (clang only)" OFF) |
| 3154 | +set(COVERAGE_TEST_OPTIONS "--smoketest;--gtest_filter=-*call*" CACHE STRING "Command line arguments for hipfft-test when generating a code coverage report (Note: an additional run of hipfft-test targeting multi_gpu* and callback* tests is always executed and coverage results are aggregated)") |
| 3155 | +if (BUILD_CODE_COVERAGE) |
| 3156 | + add_custom_target( |
| 3157 | + code_cov_tests |
| 3158 | + DEPENDS hipfft-test |
| 3159 | + COMMAND ${CMAKE_COMMAND} -E rm -rf ./coverage-report |
| 3160 | + COMMAND ${CMAKE_COMMAND} -E make_directory ./coverage-report/profraw |
| 3161 | + COMMAND ${CMAKE_COMMAND} -E env LLVM_PROFILE_FILE="./coverage-report/profraw/hipfft-coverage_%p.profraw" GTEST_LISTENER=NO_PASS_LINE_IN_LOG $<TARGET_FILE:hipfft-test> --precompile=./clients/staging/hipfft-test-precompile.db ${COVERAGE_TEST_OPTIONS} |
| 3162 | + COMMAND ${CMAKE_COMMAND} -E env LLVM_PROFILE_FILE="./coverage-report/profraw/hipfft-coverage_%p.profraw" GTEST_LISTENER=NO_PASS_LINE_IN_LOG $<TARGET_FILE:hipfft-test> --precompile=./clients/staging/hipfft-test-precompile-multi_gpu-plus-callback.db --gtest_filter=multi_gpu*:callback* |
| 3163 | + WORKING_DIRECTORY ${CMAKE_BINARY_DIR} |
| 3164 | + ) |
| 3165 | + |
| 3166 | + find_program( |
| 3167 | + LLVM_PROFDATA |
| 3168 | + llvm-profdata |
| 3169 | + REQUIRED |
| 3170 | + HINTS ${ROCM_PATH}/llvm/bin |
| 3171 | + PATHS /opt/rocm/llvm/bin |
| 3172 | + ) |
| 3173 | + |
| 3174 | + find_program( |
| 3175 | + LLVM_COV |
| 3176 | + llvm-cov |
| 3177 | + REQUIRED |
| 3178 | + HINTS ${ROCM_PATH}/llvm/bin |
| 3179 | + PATHS /opt/rocm/llvm/bin |
| 3180 | + ) |
| 3181 | + |
| 3182 | + add_custom_target( |
| 3183 | + coverage |
| 3184 | + DEPENDS code_cov_tests |
| 3185 | + COMMAND ${LLVM_PROFDATA} merge -sparse ./coverage-report/profraw/hipfft-coverage_*.profraw -o ./coverage-report/hipfft.profdata |
| 3186 | + COMMAND ${LLVM_COV} report -object ./library/libhipfftw.so -object ./library/libhipfft.so -instr-profile=./coverage-report/hipfft.profdata |
| 3187 | + COMMAND ${LLVM_COV} show -object ./library/libhipfftw.so -object ./library/libhipfft.so -instr-profile=./coverage-report/hipfft.profdata -format=html -output-dir=coverage-report |
| 3188 | + WORKING_DIRECTORY ${CMAKE_BINARY_DIR} |
| 3189 | + ) |
| 3190 | +endif() |
| 3191 | diff --git a/clients/tests/accuracy_test_1D.cpp b/clients/tests/accuracy_test_1D.cpp |
| 3192 | index 91e238e..b2aa382 100644 |
| 3193 | --- a/clients/tests/accuracy_test_1D.cpp |
| 3194 | +++ b/clients/tests/accuracy_test_1D.cpp |
| 3195 | @@ -81,14 +81,8 @@ static std::vector<size_t> small_1D_sizes() |
| 3196 | static const size_t SMALL_1D_MAX = 8192; |
| 3197 | |
| 3198 | // generate a list of sizes from 2 and up, skipping any sizes that are already covered |
| 3199 | - std::vector<size_t> covered_sizes; |
| 3200 | - std::copy(pow2_range.begin(), pow2_range.end(), std::back_inserter(covered_sizes)); |
| 3201 | - std::copy(pow3_range.begin(), pow3_range.end(), std::back_inserter(covered_sizes)); |
| 3202 | - std::copy(pow5_range.begin(), pow5_range.end(), std::back_inserter(covered_sizes)); |
| 3203 | - std::copy(radX_range.begin(), radX_range.end(), std::back_inserter(covered_sizes)); |
| 3204 | - std::copy(mix_range.begin(), mix_range.end(), std::back_inserter(covered_sizes)); |
| 3205 | - std::copy(prime_range.begin(), prime_range.end(), std::back_inserter(covered_sizes)); |
| 3206 | - std::sort(covered_sizes.begin(), covered_sizes.end()); |
| 3207 | + std::vector<size_t> covered_sizes = merge_and_sort_values<size_t>( |
| 3208 | + {pow2_range, pow3_range, pow5_range, radX_range, mix_range, prime_range}); |
| 3209 | |
| 3210 | std::vector<size_t> output; |
| 3211 | for(size_t i = 2; i < SMALL_1D_MAX; ++i) |
| 3212 | @@ -319,29 +313,24 @@ INSTANTIATE_TEST_SUITE_P(DISABLED_offset_mix_1D, |
| 3213 | |
| 3214 | // small 1D sizes just need to make sure our factorization isn't |
| 3215 | // completely broken, so we just check simple C2C outplace interleaved |
| 3216 | -INSTANTIATE_TEST_SUITE_P(small_1D, |
| 3217 | - accuracy_test, |
| 3218 | - ::testing::ValuesIn(param_generator_base( |
| 3219 | - test_prob, |
| 3220 | - {fft_transform_type_complex_forward}, |
| 3221 | - generate_lengths({small_1D_sizes()}), |
| 3222 | - {fft_precision_single}, |
| 3223 | - {1}, |
| 3224 | - [](fft_transform_type t, |
| 3225 | - const std::vector<fft_result_placement>& place_range, |
| 3226 | - const bool planar) { |
| 3227 | - return std::vector<type_place_io_t>{ |
| 3228 | - std::make_tuple(t, |
| 3229 | - place_range[0], |
| 3230 | - fft_array_type_complex_interleaved, |
| 3231 | - fft_array_type_complex_interleaved)}; |
| 3232 | - }, |
| 3233 | - stride_range, |
| 3234 | - stride_range, |
| 3235 | - ioffset_range_zero, |
| 3236 | - ooffset_range_zero, |
| 3237 | - {fft_placement_notinplace})), |
| 3238 | - accuracy_test::TestName); |
| 3239 | +const static std::vector<size_t> small_1D_lengths = small_1D_sizes(); |
| 3240 | + |
| 3241 | +INSTANTIATE_TEST_SUITE_P( |
| 3242 | + small_1D, |
| 3243 | + accuracy_test, |
| 3244 | + ::testing::ValuesIn(param_generator_base(test_prob, |
| 3245 | + {fft_transform_type_complex_forward}, |
| 3246 | + generate_lengths({small_1D_lengths}), |
| 3247 | + {fft_precision_single}, |
| 3248 | + {1}, |
| 3249 | + generate_types, |
| 3250 | + stride_range, |
| 3251 | + stride_range, |
| 3252 | + ioffset_range_zero, |
| 3253 | + ooffset_range_zero, |
| 3254 | + {fft_placement_notinplace}, |
| 3255 | + false /* planar */)), |
| 3256 | + accuracy_test::TestName); |
| 3257 | |
| 3258 | // NB: |
| 3259 | // We have known non-unit strides issues for 1D: |
| 3260 | @@ -352,7 +341,7 @@ INSTANTIATE_TEST_SUITE_P(small_1D, |
| 3261 | // main tests. |
| 3262 | // |
| 3263 | // The below test covers non-unit strides, pow of 2, middle sizes, which has SBCC/SBRC kernels |
| 3264 | -// invloved. |
| 3265 | +// involved. |
| 3266 | const static std::vector<size_t> pow2_range_for_stride = {4096, 8192, 524288}; |
| 3267 | const static std::vector<size_t> pow2_range_for_stride_half = {4096, 8192}; |
| 3268 | const static std::vector<std::vector<size_t>> stride_range_for_pow2 = {{2}, {3}}; |
| 3269 | @@ -364,7 +353,7 @@ INSTANTIATE_TEST_SUITE_P( |
| 3270 | ::testing::ValuesIn(param_generator_complex(test_prob, |
| 3271 | generate_lengths({pow2_range_for_stride}), |
| 3272 | precision_range_sp_dp, |
| 3273 | - batch_range_1D, |
| 3274 | + batch_range_for_stride, |
| 3275 | stride_range_for_pow2, |
| 3276 | stride_range_for_pow2, |
| 3277 | ioffset_range_zero, |
| 3278 | @@ -380,7 +369,7 @@ INSTANTIATE_TEST_SUITE_P( |
| 3279 | ::testing::ValuesIn(param_generator_real(test_prob, |
| 3280 | generate_lengths({pow2_range_for_stride}), |
| 3281 | precision_range_sp_dp, |
| 3282 | - batch_range_1D, |
| 3283 | + batch_range_for_stride, |
| 3284 | stride_range_for_pow2, |
| 3285 | stride_range_for_pow2, |
| 3286 | ioffset_range_zero, |
| 3287 | @@ -396,7 +385,7 @@ INSTANTIATE_TEST_SUITE_P( |
| 3288 | ::testing::ValuesIn(param_generator_real(test_prob, |
| 3289 | generate_lengths({pow2_range_for_stride_half}), |
| 3290 | {fft_precision_half}, |
| 3291 | - batch_range_1D, |
| 3292 | + batch_range_for_stride, |
| 3293 | stride_range_for_pow2, |
| 3294 | stride_range_for_pow2, |
| 3295 | ioffset_range_zero, |
| 3296 | @@ -537,3 +526,24 @@ INSTANTIATE_TEST_SUITE_P( |
| 3297 | ooffset_range_zero, |
| 3298 | place_range)), |
| 3299 | accuracy_test::TestName); |
| 3300 | + |
| 3301 | +const static std::vector<size_t> lengths_for_disabled_autoalloc = merge_and_sort_values<size_t>( |
| 3302 | + {pow2_range, pow3_range, pow5_range, radX_range, mix_range, small_1D_lengths, prime_range}, |
| 3303 | + 128); |
| 3304 | + |
| 3305 | +INSTANTIATE_TEST_SUITE_P( |
| 3306 | + various_1D, |
| 3307 | + accuracy_test, |
| 3308 | + ::testing::ValuesIn(param_generator(test_prob, |
| 3309 | + generate_lengths({lengths_for_disabled_autoalloc}), |
| 3310 | + precision_range_sp_dp, |
| 3311 | + batch_range_1D, |
| 3312 | + stride_range, |
| 3313 | + stride_range, |
| 3314 | + ioffset_range_zero, |
| 3315 | + ooffset_range_zero, |
| 3316 | + place_range, |
| 3317 | + false, |
| 3318 | + false, |
| 3319 | + fft_auto_allocation_off)), |
| 3320 | + accuracy_test::TestName); |
| 3321 | diff --git a/clients/tests/accuracy_test_2D.cpp b/clients/tests/accuracy_test_2D.cpp |
| 3322 | index 5f1b8a9..6f45abe 100644 |
| 3323 | --- a/clients/tests/accuracy_test_2D.cpp |
| 3324 | +++ b/clients/tests/accuracy_test_2D.cpp |
| 3325 | @@ -278,3 +278,24 @@ INSTANTIATE_TEST_SUITE_P(len1_swap_2D, |
| 3326 | false, |
| 3327 | false)), |
| 3328 | accuracy_test::TestName); |
| 3329 | + |
| 3330 | +const static std::vector<size_t> lengths_for_disabled_autoalloc |
| 3331 | + = merge_and_sort_values<size_t>({pow2_range, pow3_range, prime_range, mix_range}, 12); |
| 3332 | + |
| 3333 | +INSTANTIATE_TEST_SUITE_P( |
| 3334 | + various_2D, |
| 3335 | + accuracy_test, |
| 3336 | + ::testing::ValuesIn(param_generator(test_prob, |
| 3337 | + generate_lengths({lengths_for_disabled_autoalloc, |
| 3338 | + lengths_for_disabled_autoalloc}), |
| 3339 | + precision_range_sp_dp, |
| 3340 | + batch_range, |
| 3341 | + stride_range, |
| 3342 | + stride_range, |
| 3343 | + ioffset_range_zero, |
| 3344 | + ooffset_range_zero, |
| 3345 | + place_range, |
| 3346 | + false, |
| 3347 | + false, |
| 3348 | + fft_auto_allocation_off)), |
| 3349 | + accuracy_test::TestName); |
| 3350 | diff --git a/clients/tests/accuracy_test_3D.cpp b/clients/tests/accuracy_test_3D.cpp |
| 3351 | index f338695..f3f780d 100644 |
| 3352 | --- a/clients/tests/accuracy_test_3D.cpp |
| 3353 | +++ b/clients/tests/accuracy_test_3D.cpp |
| 3354 | @@ -284,3 +284,25 @@ INSTANTIATE_TEST_SUITE_P( |
| 3355 | false, |
| 3356 | false)), |
| 3357 | accuracy_test::TestName); |
| 3358 | + |
| 3359 | +const static std::vector<size_t> lengths_for_disabled_autoalloc = merge_and_sort_values<size_t>( |
| 3360 | + {pow2_range, pow3_range, pow5_range, prime_range, sbrc_range}, 5); |
| 3361 | + |
| 3362 | +INSTANTIATE_TEST_SUITE_P( |
| 3363 | + various_3D, |
| 3364 | + accuracy_test, |
| 3365 | + ::testing::ValuesIn(param_generator(test_prob, |
| 3366 | + generate_lengths({lengths_for_disabled_autoalloc, |
| 3367 | + lengths_for_disabled_autoalloc, |
| 3368 | + lengths_for_disabled_autoalloc}), |
| 3369 | + precision_range_sp_dp, |
| 3370 | + batch_range, |
| 3371 | + stride_range, |
| 3372 | + stride_range, |
| 3373 | + ioffset_range_zero, |
| 3374 | + ooffset_range_zero, |
| 3375 | + place_range, |
| 3376 | + false, |
| 3377 | + false, |
| 3378 | + fft_auto_allocation_off)), |
| 3379 | + accuracy_test::TestName); |
| 3380 | diff --git a/clients/tests/accuracy_test_callback.cpp b/clients/tests/accuracy_test_callback.cpp |
| 3381 | index 00e6026..0bae646 100644 |
| 3382 | --- a/clients/tests/accuracy_test_callback.cpp |
| 3383 | +++ b/clients/tests/accuracy_test_callback.cpp |
| 3384 | @@ -104,22 +104,27 @@ const static std::vector<std::vector<size_t>> ooffset_range = {{0, 0}, {1, 1}}; |
| 3385 | auto transform_types = {fft_transform_type_complex_forward, fft_transform_type_real_forward}; |
| 3386 | |
| 3387 | #ifdef __HIP__ |
| 3388 | -INSTANTIATE_TEST_SUITE_P(callback, |
| 3389 | - accuracy_test, |
| 3390 | - ::testing::ValuesIn(param_generator_base(test_prob, |
| 3391 | - transform_types, |
| 3392 | - callback_sizes, |
| 3393 | - precision_range_sp_dp, |
| 3394 | - batch_range, |
| 3395 | - generate_types, |
| 3396 | - stride_range, |
| 3397 | - stride_range, |
| 3398 | - ioffset_range_zero, |
| 3399 | - ooffset_range_zero, |
| 3400 | - place_range, |
| 3401 | - false, |
| 3402 | - true)), |
| 3403 | - accuracy_test::TestName); |
| 3404 | +INSTANTIATE_TEST_SUITE_P( |
| 3405 | +#ifdef WIN32 |
| 3406 | + DISABLED_callback_no_offset, |
| 3407 | +#else |
| 3408 | + callback, |
| 3409 | +#endif |
| 3410 | + accuracy_test, |
| 3411 | + ::testing::ValuesIn(param_generator_base(test_prob, |
| 3412 | + transform_types, |
| 3413 | + callback_sizes, |
| 3414 | + precision_range_sp_dp, |
| 3415 | + batch_range, |
| 3416 | + generate_types, |
| 3417 | + stride_range, |
| 3418 | + stride_range, |
| 3419 | + ioffset_range_zero, |
| 3420 | + ooffset_range_zero, |
| 3421 | + place_range, |
| 3422 | + false, |
| 3423 | + true)), |
| 3424 | + accuracy_test::TestName); |
| 3425 | |
| 3426 | INSTANTIATE_TEST_SUITE_P(DISABLED_callback, |
| 3427 | accuracy_test, |
| 3428 | diff --git a/clients/tests/gtest_main.cpp b/clients/tests/gtest_main.cpp |
| 3429 | index b574106..ee2f5ec 100644 |
| 3430 | --- a/clients/tests/gtest_main.cpp |
| 3431 | +++ b/clients/tests/gtest_main.cpp |
| 3432 | @@ -41,20 +41,18 @@ |
| 3433 | #include "../hipfft_params.h" |
| 3434 | #include "hipfft/hipfft.h" |
| 3435 | #include "hipfft_accuracy_test.h" |
| 3436 | -#include "hipfft_test_params.h" |
| 3437 | + |
| 3438 | +// initialize static class member of hipfft_params |
| 3439 | +std::vector<gpubuf> hipfft_params::externally_managed_workareas = std::vector<gpubuf>(); |
| 3440 | |
| 3441 | // Control output verbosity: |
| 3442 | int verbose; |
| 3443 | |
| 3444 | -// Run a short (~5 min) test suite by setting test_prob to an appropriate value |
| 3445 | -bool smoketest = false; |
| 3446 | - |
| 3447 | // User-defined random seed |
| 3448 | -size_t random_seed; |
| 3449 | +size_t random_seed; |
| 3450 | +std::random_device default_seed_dev; |
| 3451 | // Overall probability of running conventional tests |
| 3452 | double test_prob; |
| 3453 | -// Probability of running tests from the emulation suite |
| 3454 | -double emulation_prob; |
| 3455 | // Modifier for probability of running tests with complex interleaved data |
| 3456 | double complex_interleaved_prob_factor; |
| 3457 | // Modifier for probability of running tests with real data |
| 3458 | @@ -63,9 +61,12 @@ double real_prob_factor; |
| 3459 | double complex_planar_prob_factor; |
| 3460 | // Modifier for probability of running tests with callbacks |
| 3461 | double callback_prob_factor; |
| 3462 | +// Constraints for the hipfftw tests |
| 3463 | +size_t max_length_for_hipfftw_test; |
| 3464 | +size_t max_io_gb_for_hipfftw_test; |
| 3465 | |
| 3466 | // Transform parameters for manual test: |
| 3467 | -fft_params manual_params; |
| 3468 | +hipfft_params manual_params; |
| 3469 | |
| 3470 | // Host memory limitation for tests (GiB): |
| 3471 | size_t ramgb; |
| 3472 | @@ -104,7 +105,7 @@ last_cpu_fft_cache last_cpu_fft_data; |
| 3473 | fft_params::fft_mp_lib mp_lib = fft_params::fft_mp_lib_none; |
| 3474 | // Number of multi-process ranks to launch |
| 3475 | int mp_ranks = 1; |
| 3476 | -// Multi-process launch command (e.g. mpirun --np 4 /path/to/rocfft_mpi_worker) |
| 3477 | +// Multi-process launch command (e.g. mpirun --np 4 /path/to/hipfft_mpi_worker) |
| 3478 | std::string mp_launch; |
| 3479 | |
| 3480 | void init_gtest_flags() |
| 3481 | @@ -174,9 +175,10 @@ void precompile_test_kernels(const std::string& precompile_file) |
| 3482 | continue; |
| 3483 | |
| 3484 | // only care about accuracy tests |
| 3485 | - if(name.find("vs_fftw/") != std::string::npos) |
| 3486 | + const auto pos = name.find("vs_fftw/"); |
| 3487 | + if(pos != std::string::npos) |
| 3488 | { |
| 3489 | - name.erase(0, 8); |
| 3490 | + name.erase(0, pos + 8); |
| 3491 | |
| 3492 | // change batch to 1, so we don't waste time creating |
| 3493 | // multiple plans that differ only by batch |
| 3494 | @@ -199,7 +201,7 @@ void precompile_test_kernels(const std::string& precompile_file) |
| 3495 | std::mt19937 dist(dev()); |
| 3496 | std::shuffle(tokens.begin(), tokens.end(), dist); |
| 3497 | auto precompile_begin = std::chrono::steady_clock::now(); |
| 3498 | - std::cout << "precompiling " << tokens.size() << " FFT plans...\n"; |
| 3499 | + std::cout << "precompiling kernels for " << tokens.size() << " tokens...\n"; |
| 3500 | |
| 3501 | for(auto&& t : tokens) |
| 3502 | tokenQueue.push(std::move(t)); |
| 3503 | @@ -222,6 +224,13 @@ void precompile_test_kernels(const std::string& precompile_file) |
| 3504 | params.from_token(token); |
| 3505 | params.validate(); |
| 3506 | params.create_plan(); |
| 3507 | + if(params.is_forward()) |
| 3508 | + { |
| 3509 | + hipfft_params inverse_params; |
| 3510 | + inverse_params.inverse_from_forward(params); |
| 3511 | + inverse_params.validate(); |
| 3512 | + inverse_params.create_plan(); |
| 3513 | + } |
| 3514 | } |
| 3515 | catch(fft_params::work_buffer_alloc_failure&) |
| 3516 | { |
| 3517 | @@ -271,26 +280,45 @@ int main(int argc, char* argv[]) |
| 3518 | " HP - hermitian planar\n" |
| 3519 | "\n" |
| 3520 | "Usage"}; |
| 3521 | - |
| 3522 | - // Override CLI11 help to print after later CLI11 options that are defined, and allow gtest's help |
| 3523 | + // Override CLI11 help to print it along gtest's help |
| 3524 | app.set_help_flag(""); |
| 3525 | - CLI::Option* opt_help = app.add_flag("-h, --help", "Produces this help message"); |
| 3526 | + const auto opt_help = app.add_flag("-h, --help", "Produces this help message"); |
| 3527 | app.add_option("-v, --verbose", verbose, "Print out detailed information for the tests") |
| 3528 | ->default_val(0); |
| 3529 | app.add_option("--test_prob", test_prob, "Probability of running individual tests") |
| 3530 | ->default_val(1.0) |
| 3531 | ->check(CLI::Range(0.0, 1.0)); |
| 3532 | + app.add_option("--real_prob", |
| 3533 | + real_prob_factor, |
| 3534 | + "Probability multiplier for running individual real/complex transforms") |
| 3535 | + ->default_val(1.0) |
| 3536 | + ->check(CLI::PositiveNumber); |
| 3537 | + app.add_option("--planar_prob", |
| 3538 | + complex_planar_prob_factor, |
| 3539 | + "Probability multiplier for running individual planar transforms") |
| 3540 | + ->default_val(0.1) |
| 3541 | + ->check(CLI::PositiveNumber); |
| 3542 | app.add_option( |
| 3543 | "--complex_interleaved_prob_factor", |
| 3544 | complex_interleaved_prob_factor, |
| 3545 | "Probability multiplier for running individual transforms with complex interleaved data") |
| 3546 | ->default_val(1) |
| 3547 | - ->check(CLI::NonNegativeNumber); |
| 3548 | + ->check(CLI::PositiveNumber); |
| 3549 | app.add_option("--callback_prob", |
| 3550 | callback_prob_factor, |
| 3551 | "Probability multiplier for running individual callback transforms") |
| 3552 | ->default_val(0.1) |
| 3553 | ->check(CLI::NonNegativeNumber); |
| 3554 | + app.add_option("--max_hipfftw_test_len", |
| 3555 | + max_length_for_hipfftw_test, |
| 3556 | + "Maximum length to be considered in hipfftw tests") |
| 3557 | + ->default_val(8192) |
| 3558 | + ->check(CLI::PositiveNumber); |
| 3559 | + app.add_option("--max_io_gb_for_hipfftw_test", |
| 3560 | + max_io_gb_for_hipfftw_test, |
| 3561 | + "Maximum size of I/O to be considered in hipfftw tests in GiB") |
| 3562 | + ->default_val(1) /* 1 GiB */ |
| 3563 | + ->check(CLI::PositiveNumber); |
| 3564 | |
| 3565 | app.add_option("--fftw_compare", fftw_compare, "Compare to FFTW in accuracy tests") |
| 3566 | ->default_val(true); |
| 3567 | @@ -302,71 +330,43 @@ int main(int argc, char* argv[]) |
| 3568 | app.add_option("--mp_launch", |
| 3569 | mp_launch, |
| 3570 | "Command line prefix to launch multi-process transforms, e.g. \"mpirun --np 4 " |
| 3571 | - "/path/to/rocfft_mpi_worker\"") |
| 3572 | + "/path/to/hipfft_mpi_worker\"") |
| 3573 | ->default_val("") |
| 3574 | ->each([&](const std::string&) { |
| 3575 | if(mp_lib == fft_params::fft_mp_lib_none) |
| 3576 | { |
| 3577 | - std::cout << "--mp_launch requires an mp library (see mp_lib in --help).\n"; |
| 3578 | - std::exit(-1); |
| 3579 | + throw CLI::ValidationError( |
| 3580 | + "--mp_launch requires an mp library (see mp_lib in --help)"); |
| 3581 | } |
| 3582 | }) |
| 3583 | ->needs("--mp_lib"); |
| 3584 | - // FIXME: Seed has no use currently |
| 3585 | - // CLI::Option* opt_seed = |
| 3586 | - app.add_option("--seed", random_seed, "Random seed; if unset, use an actual random seed"); |
| 3587 | + app.add_option("--seed", random_seed, "Random seed; if unset, use an actual random seed") |
| 3588 | + ->default_val(default_seed_dev()); |
| 3589 | app.add_flag("--smoketest", "Run a short (approx 5 minute) randomized selection of tests") |
| 3590 | ->each([&](const std::string&) { |
| 3591 | - // The objective is to have an test that takes about 5 minutes, so just set the probability |
| 3592 | - // per test to a small value to achieve this result. |
| 3593 | + // The objective is to have an test that takes about 5 minutes, so just set the |
| 3594 | + // probability per test to a small value to achieve this result. |
| 3595 | test_prob = 0.002; |
| 3596 | }); |
| 3597 | - |
| 3598 | - // Try parsing initial args that will be used to configure tests |
| 3599 | - // Allow extras to pass on gtest and hipFFT arguments without error |
| 3600 | - app.allow_extras(); |
| 3601 | - try |
| 3602 | - { |
| 3603 | - app.parse(argc, argv); |
| 3604 | - } |
| 3605 | - catch(const CLI::ParseError& e) |
| 3606 | - { |
| 3607 | - return app.exit(e); |
| 3608 | - } |
| 3609 | - |
| 3610 | - // NB: If we initialize gtest first, then it removes all of its own command-line |
| 3611 | - // arguments and sets argc and argv correctly; |
| 3612 | - ::testing::InitGoogleTest(&argc, argv); |
| 3613 | - |
| 3614 | - // Filename for fftw and fftwf wisdom. |
| 3615 | - std::string fftw_wisdom_filename; |
| 3616 | - |
| 3617 | // Token string to fully specify fft params for the manual test. |
| 3618 | std::string test_token; |
| 3619 | - |
| 3620 | - // Filename for precompiled kernels to be written to |
| 3621 | - std::string precompile_file; |
| 3622 | - |
| 3623 | - // Declare the supported options. Some option pointers are declared to track passed opts. |
| 3624 | - app.add_flag("--callback", "Inject load/store callbacks")->each([&](const std::string&) { |
| 3625 | - manual_params.run_callbacks = true; |
| 3626 | - }); |
| 3627 | - // app.add_flag("--version", "Print queryable version information from the rocfft library") |
| 3628 | - // ->each([](const std::string&) { |
| 3629 | - // rocfft_setup(); |
| 3630 | - // char v[256]; |
| 3631 | - // rocfft_get_version_string(v, 256); |
| 3632 | - // std::cout << "rocFFT version: " << v << std::endl; |
| 3633 | - // return EXIT_SUCCESS; |
| 3634 | - // }); |
| 3635 | - CLI::Option* opt_token |
| 3636 | + auto* opt_token |
| 3637 | = app.add_option("--token", test_token, "Test token name for manual test")->default_val(""); |
| 3638 | // Group together options that conflict with --token |
| 3639 | auto* non_token = app.add_option_group("Token Conflict", "Options excluded by --token"); |
| 3640 | + non_token->excludes(opt_token); |
| 3641 | + // Declare the supported options. Some option pointers are declared to track passed opts. |
| 3642 | + non_token->add_flag("--callback", "Inject load/store callbacks")->each([&](const std::string&) { |
| 3643 | + manual_params.run_callbacks = true; |
| 3644 | + }); |
| 3645 | + non_token |
| 3646 | + ->add_option("--auto_allocation", |
| 3647 | + manual_params.auto_allocate, |
| 3648 | + "hipFFT's auto-allocation behavior: \"on\", \"off\", or \"default\"") |
| 3649 | + ->default_val("default"); |
| 3650 | non_token |
| 3651 | ->add_flag("--double", "Double precision transform (deprecated: use --precision double)") |
| 3652 | ->each([&](const std::string&) { manual_params.precision = fft_precision_double; }); |
| 3653 | - non_token->excludes(opt_token); |
| 3654 | non_token |
| 3655 | ->add_option("-t, --transformType", |
| 3656 | manual_params.transform_type, |
| 3657 | @@ -406,8 +406,20 @@ int main(int argc, char* argv[]) |
| 3658 | ->default_val(0); |
| 3659 | non_token->add_option("--ioffset", manual_params.ioffset, "Input offset"); |
| 3660 | non_token->add_option("--ooffset", manual_params.ooffset, "Output offset"); |
| 3661 | - app.add_option("--isize", manual_params.isize, "Logical size of input buffer"); |
| 3662 | - app.add_option("--osize", manual_params.osize, "Logical size of output buffer"); |
| 3663 | + non_token->add_option("--isize", manual_params.isize, "Logical size of input buffer"); |
| 3664 | + non_token->add_option("--osize", manual_params.osize, "Logical size of output buffer"); |
| 3665 | + non_token->add_option( |
| 3666 | + "--scalefactor", manual_params.scale_factor, "Scale factor to apply to output"); |
| 3667 | + // Default value is set in fft_params.h based on if device-side PRNG was enabled. |
| 3668 | + non_token->add_option("-g, --inputGen", |
| 3669 | + manual_params.igen, |
| 3670 | + "Input data generation:\n0) PRNG sequence (device)\n" |
| 3671 | + "1) PRNG sequence (host)\n" |
| 3672 | + "2) linearly-spaced sequence (device)\n" |
| 3673 | + "3) linearly-spaced sequence (host)"); |
| 3674 | + const auto* opt_version = app.add_flag( |
| 3675 | + "--version", |
| 3676 | + "Print queryable version information from the hipfft library's backend (and return)"); |
| 3677 | app.add_option("--R", ramgb, "RAM limit in GiB for tests") |
| 3678 | ->default_val(host_memory::singleton().get_total_gbytes()); |
| 3679 | app.add_option("--V", vramgb, "VRAM limit in GiB for tests")->default_val(0); |
| 3680 | @@ -419,22 +431,19 @@ int main(int argc, char* argv[]) |
| 3681 | "Skip the test if there is a runtime failure") |
| 3682 | ->default_val(true); |
| 3683 | app.add_option("-w, --wise", use_fftw_wisdom, "Use FFTW wisdom"); |
| 3684 | + // Filename for fftw and fftwf wisdom. |
| 3685 | + std::string fftw_wisdom_filename; |
| 3686 | app.add_option("-W, --wisdomfile", fftw_wisdom_filename, "FFTW3 wisdom filename") |
| 3687 | ->default_val("wisdom3.txt"); |
| 3688 | - app.add_option("--scalefactor", manual_params.scale_factor, "Scale factor to apply to output"); |
| 3689 | + // Filename for precompiled kernels to be written to |
| 3690 | + std::string precompile_file; |
| 3691 | app.add_option("--precompile", |
| 3692 | precompile_file, |
| 3693 | "Precompile kernels to a file for all test cases before running tests") |
| 3694 | ->default_val(""); |
| 3695 | - // Default value is set in fft_params.h based on if device-side PRNG was enabled. |
| 3696 | - app.add_option("-g, --inputGen", |
| 3697 | - manual_params.igen, |
| 3698 | - "Input data generation:\n0) PRNG sequence (device)\n" |
| 3699 | - "1) PRNG sequence (host)\n" |
| 3700 | - "2) linearly-spaced sequence (device)\n" |
| 3701 | - "3) linearly-spaced sequence (host)"); |
| 3702 | - |
| 3703 | - // Parse rest of args and catch any errors here |
| 3704 | + // Try parsing initial args that will be used to configure tests |
| 3705 | + // Allow extras to pass on gtest arguments without error |
| 3706 | + app.allow_extras(); |
| 3707 | try |
| 3708 | { |
| 3709 | app.parse(argc, argv); |
| 3710 | @@ -444,26 +453,21 @@ int main(int argc, char* argv[]) |
| 3711 | return app.exit(e); |
| 3712 | } |
| 3713 | |
| 3714 | - if(*opt_help) |
| 3715 | + if(!test_token.empty()) |
| 3716 | { |
| 3717 | - std::cout << app.help() << "\n"; |
| 3718 | - return EXIT_SUCCESS; |
| 3719 | - } |
| 3720 | + std::cout << "Reading fft params from token:\n" << test_token << std::endl; |
| 3721 | |
| 3722 | - // Ensure there are no leftover options used by neither gtest nor CLI11 |
| 3723 | - std::vector<std::string> remaining_args = app.remaining(); |
| 3724 | - if(!remaining_args.empty()) |
| 3725 | - { |
| 3726 | - std::cout << "Unrecognised option(s) found:\n "; |
| 3727 | - for(auto i : app.remaining()) |
| 3728 | - std::cout << i << " "; |
| 3729 | - std::cout << "\nRun with --help for more information.\n"; |
| 3730 | - return EXIT_FAILURE; |
| 3731 | + try |
| 3732 | + { |
| 3733 | + manual_params.from_token(test_token); |
| 3734 | + std::cout << "manual_params.token() = " << manual_params.token() << std::endl; |
| 3735 | + } |
| 3736 | + catch(...) |
| 3737 | + { |
| 3738 | + std::cout << "Unable to parse token." << std::endl; |
| 3739 | + return 1; |
| 3740 | + } |
| 3741 | } |
| 3742 | - |
| 3743 | - std::cout << "half epsilon: " << half_epsilon << "\tsingle epsilon: " << single_epsilon |
| 3744 | - << "\tdouble epsilon: " << double_epsilon << std::endl; |
| 3745 | - |
| 3746 | if(manual_params.length.empty()) |
| 3747 | { |
| 3748 | manual_params.length.push_back(8); |
| 3749 | @@ -482,6 +486,66 @@ int main(int argc, char* argv[]) |
| 3750 | // TODO: add random size? |
| 3751 | } |
| 3752 | |
| 3753 | + // User-settable options defining the values of all the actual test parameters |
| 3754 | + // (e.g., probability factors and value of manual_params) must be handled |
| 3755 | + // before invoking ::testing::InitGoogleTest as it triggers evaluation of said |
| 3756 | + // parameters (e.g., args of "::testing::Values{In}" in instantiations of test |
| 3757 | + // suites). |
| 3758 | + // set any "unset" parameters of manual_params before initiating gtests |
| 3759 | + // (makes the token reported by gtest less ambiguous) |
| 3760 | + manual_params.validate(); |
| 3761 | + |
| 3762 | + // extract remaining arguments for subsequent gtest initialization |
| 3763 | + std::vector<std::string> remaining_args = app.remaining(); |
| 3764 | + std::string gtest_help_opt = "--help"; |
| 3765 | + // NB: If we initialize gtest first, then it removes all of its own command-line |
| 3766 | + // arguments and sets argc and argv correctly; |
| 3767 | + std::vector<char*> gtest_argv; |
| 3768 | + gtest_argv.insert(gtest_argv.begin(), argv[0]); |
| 3769 | + for(std::string& s : remaining_args) |
| 3770 | + { |
| 3771 | + gtest_argv.push_back(&s[0]); |
| 3772 | + } |
| 3773 | + if(*opt_help) |
| 3774 | + { |
| 3775 | + // make sure gtest prints its help as well |
| 3776 | + gtest_argv.push_back(>est_help_opt[0]); |
| 3777 | + } |
| 3778 | + gtest_argv.push_back(NULL); |
| 3779 | + decltype(argc) gtest_argc = gtest_argv.size() - 1; |
| 3780 | + ::testing::InitGoogleTest(>est_argc, gtest_argv.data()); // gtest-relevant args are removed |
| 3781 | + |
| 3782 | + if(*opt_help) |
| 3783 | + { |
| 3784 | + std::cout << app.help() << "\n"; |
| 3785 | + return EXIT_SUCCESS; |
| 3786 | + } |
| 3787 | + // no help was used, gtest_argc is expected to be 1 at this point. If not, some of the |
| 3788 | + // used options were not recognized at all |
| 3789 | + if(gtest_argc > 1) |
| 3790 | + { |
| 3791 | + std::cout << "Unrecognised option(s) found:\n "; |
| 3792 | + for(auto i = 1; i < gtest_argc; i++) |
| 3793 | + std::cout << gtest_argv[i] << " "; |
| 3794 | + std::cout << "\nRun with --help for more information.\n"; |
| 3795 | + return EXIT_FAILURE; |
| 3796 | + } |
| 3797 | + |
| 3798 | + if(*opt_version || verbose > 0) |
| 3799 | + { |
| 3800 | + int hipfft_version; |
| 3801 | + hipfftGetVersion(&hipfft_version); |
| 3802 | + std::cout << "hipFFT version: " << hipfft_version << std::endl; |
| 3803 | + if(*opt_version) |
| 3804 | + { |
| 3805 | + return EXIT_SUCCESS; |
| 3806 | + } |
| 3807 | + } |
| 3808 | + |
| 3809 | + std::cout << "Using random_seed = " << random_seed << std::endl; |
| 3810 | + std::cout << "half epsilon: " << half_epsilon << "\tsingle epsilon: " << single_epsilon |
| 3811 | + << "\tdouble epsilon: " << double_epsilon << std::endl; |
| 3812 | + |
| 3813 | // if precompiling, tell rocFFT to use the specified cache file |
| 3814 | // to write kernels to |
| 3815 | // |
| 3816 | @@ -494,11 +558,6 @@ int main(int argc, char* argv[]) |
| 3817 | precompile_file.c_str()); |
| 3818 | } |
| 3819 | |
| 3820 | - // rocfft_setup(); |
| 3821 | - // char v[256]; |
| 3822 | - // rocfft_get_version_string(v, 256); |
| 3823 | - // std::cout << "rocFFT version: " << v << std::endl; |
| 3824 | - |
| 3825 | #ifdef FFTW_MULTITHREAD |
| 3826 | fftw_init_threads(); |
| 3827 | fftwf_init_threads(); |
| 3828 | @@ -508,6 +567,7 @@ int main(int argc, char* argv[]) |
| 3829 | |
| 3830 | // Set host memory limit from command-line options |
| 3831 | host_memory::singleton().set_limit_gbytes(ramgb); |
| 3832 | + std::cout << "Host memory limit: " << ramgb << " GiB" << std::endl; |
| 3833 | |
| 3834 | if(use_fftw_wisdom) |
| 3835 | { |
| 3836 | @@ -555,21 +615,6 @@ int main(int argc, char* argv[]) |
| 3837 | fftwf_import_wisdom_from_string(fftwf_wisdom.c_str()); |
| 3838 | } |
| 3839 | |
| 3840 | - if(!test_token.empty()) |
| 3841 | - { |
| 3842 | - std::cout << "Reading fft params from token:\n" << test_token << std::endl; |
| 3843 | - |
| 3844 | - try |
| 3845 | - { |
| 3846 | - manual_params.from_token(test_token); |
| 3847 | - } |
| 3848 | - catch(...) |
| 3849 | - { |
| 3850 | - std::cout << "Unable to parse token." << std::endl; |
| 3851 | - return 1; |
| 3852 | - } |
| 3853 | - } |
| 3854 | - |
| 3855 | if(!precompile_file.empty()) |
| 3856 | precompile_test_kernels(precompile_file); |
| 3857 | |
| 3858 | @@ -593,38 +638,14 @@ int main(int argc, char* argv[]) |
| 3859 | std::cout << "double precision max l-inf epsilon: " << max_linf_eps_double << std::endl; |
| 3860 | std::cout << "double precision max l2 epsilon: " << max_l2_eps_double << std::endl; |
| 3861 | |
| 3862 | - // rocfft_cleanup(); |
| 3863 | + hipfft_params::externally_managed_workareas.clear(); |
| 3864 | + |
| 3865 | return retval; |
| 3866 | } |
| 3867 | |
| 3868 | -TEST(manual, vs_fftw) |
| 3869 | -{ |
| 3870 | - // Run an individual test using the provided command-line parameters. |
| 3871 | - |
| 3872 | - std::cout << "Manual test:" << std::endl; |
| 3873 | - |
| 3874 | - manual_params.validate(); |
| 3875 | - |
| 3876 | - std::cout << "Token: " << manual_params.token() << std::endl; |
| 3877 | - |
| 3878 | - hipfft_params params(manual_params); |
| 3879 | - |
| 3880 | - try |
| 3881 | - { |
| 3882 | - fft_vs_reference(params, false); |
| 3883 | - } |
| 3884 | - catch(HOSTBUF_MEM_USAGE& e) |
| 3885 | - { |
| 3886 | - // explicitly clear test cache |
| 3887 | - last_cpu_fft_data = last_cpu_fft_cache(); |
| 3888 | - GTEST_SKIP() << e.msg; |
| 3889 | - } |
| 3890 | - catch(ROCFFT_SKIP& e) |
| 3891 | - { |
| 3892 | - GTEST_SKIP() << e.msg; |
| 3893 | - } |
| 3894 | - catch(ROCFFT_FAIL& e) |
| 3895 | - { |
| 3896 | - GTEST_FAIL() << e.msg; |
| 3897 | - } |
| 3898 | -} |
| 3899 | +// instantiation of the paramameterized accuracy_test for the |
| 3900 | +// configuration set manually: |
| 3901 | +INSTANTIATE_TEST_SUITE_P(manual, |
| 3902 | + accuracy_test, |
| 3903 | + ::testing::Values(static_cast<const fft_params&>(manual_params)), |
| 3904 | + accuracy_test::TestName); |
| 3905 | diff --git a/clients/tests/hipfft_accuracy_test.cpp b/clients/tests/hipfft_accuracy_test.cpp |
| 3906 | index e228c5f..a5c763e 100644 |
| 3907 | --- a/clients/tests/hipfft_accuracy_test.cpp |
| 3908 | +++ b/clients/tests/hipfft_accuracy_test.cpp |
| 3909 | @@ -34,6 +34,7 @@ |
| 3910 | #include "../../shared/accuracy_test.h" |
| 3911 | #include "../../shared/fftw_transform.h" |
| 3912 | #include "../../shared/gpubuf.h" |
| 3913 | +#include "../../shared/params_gen.h" |
| 3914 | #include "../../shared/rocfft_against_fftw.h" |
| 3915 | #include "../../shared/rocfft_complex.h" |
| 3916 | #include "../../shared/subprocess.h" |
| 3917 | @@ -42,6 +43,21 @@ extern std::string mp_launch; |
| 3918 | |
| 3919 | extern last_cpu_fft_cache last_cpu_fft_data; |
| 3920 | |
| 3921 | +// clang-format off |
| 3922 | +// tokens of tests found to be symptomatic |
| 3923 | +static const std::vector<std::string> symptomatic_tokens = { |
| 3924 | +#ifndef _CUFFT_BACKEND |
| 3925 | +// cases specific to ROCM backend |
| 3926 | +#else |
| 3927 | + // cases specific to CUFFT backend |
| 3928 | + "real_forward_len_16384_half_ip_batch_4_istride_1_R_ostride_1_HI_idist_16386_odist_8193_ioffset_0_0_ooffset_0_0", |
| 3929 | + "real_forward_len_32768_half_ip_batch_4_istride_1_R_ostride_1_HI_idist_32770_odist_16385_ioffset_0_0_ooffset_0_0", |
| 3930 | + "real_forward_len_65536_half_ip_batch_2_istride_1_R_ostride_1_HI_idist_65538_odist_32769_ioffset_0_0_ooffset_0_0", |
| 3931 | +#endif |
| 3932 | + // common to both backends |
| 3933 | +}; |
| 3934 | +// clang-format on |
| 3935 | + |
| 3936 | void fft_vs_reference(hipfft_params& params, bool round_trip) |
| 3937 | { |
| 3938 | switch(params.precision) |
| 3939 | @@ -78,29 +94,43 @@ TEST_P(accuracy_test, vs_fftw) |
| 3940 | { |
| 3941 | case fft_params::fft_mp_lib_none: |
| 3942 | { |
| 3943 | - // only do round trip for single-GPU FFTs |
| 3944 | - bool round_trip = params.multiGPU <= 1; |
| 3945 | + // skipping symptomatic case(s), unless forcefully/knowingly executing normally-disabled |
| 3946 | + // test tokens (e.g., by using --gtest_also_run_disabled_tests) |
| 3947 | + const char* test_suite_name |
| 3948 | + = ::testing::UnitTest::GetInstance()->current_test_info()->test_suite_name(); |
| 3949 | + if(!symptomatic_tokens.empty() && std::strstr(test_suite_name, "DISABLED") == nullptr |
| 3950 | + && std::find(symptomatic_tokens.begin(), symptomatic_tokens.end(), params.token()) |
| 3951 | + != symptomatic_tokens.end()) |
| 3952 | + { |
| 3953 | + GTEST_SKIP() |
| 3954 | + << "Symptomatic test that's currently disabled by default (force-skipping). Use " |
| 3955 | + "CLI arguments '--gtest_also_run_disabled_tests' to force the test execution " |
| 3956 | + "(via another test suite)."; |
| 3957 | + } |
| 3958 | + // only do round trip for forward FFTs |
| 3959 | + const bool do_round_trip = params.is_forward(); |
| 3960 | |
| 3961 | - if(!params.run_callbacks) |
| 3962 | + try |
| 3963 | { |
| 3964 | - try |
| 3965 | - { |
| 3966 | - fft_vs_reference(params, round_trip); |
| 3967 | - } |
| 3968 | - catch(HOSTBUF_MEM_USAGE& e) |
| 3969 | - { |
| 3970 | - // explicitly clear cache |
| 3971 | - last_cpu_fft_data = last_cpu_fft_cache(); |
| 3972 | - GTEST_SKIP() << e.msg; |
| 3973 | - } |
| 3974 | - catch(ROCFFT_SKIP& e) |
| 3975 | - { |
| 3976 | - GTEST_SKIP() << e.msg; |
| 3977 | - } |
| 3978 | - catch(ROCFFT_FAIL& e) |
| 3979 | - { |
| 3980 | - GTEST_FAIL() << e.msg; |
| 3981 | - } |
| 3982 | + fft_vs_reference(params, do_round_trip); |
| 3983 | + } |
| 3984 | + catch(HOSTBUF_MEM_USAGE& e) |
| 3985 | + { |
| 3986 | + // explicitly clear cache |
| 3987 | + last_cpu_fft_data = last_cpu_fft_cache(); |
| 3988 | + GTEST_SKIP() << e.msg; |
| 3989 | + } |
| 3990 | + catch(ROCFFT_SKIP& e) |
| 3991 | + { |
| 3992 | + GTEST_SKIP() << e.msg; |
| 3993 | + } |
| 3994 | + catch(const fft_params::unimplemented_exception& e) |
| 3995 | + { |
| 3996 | + GTEST_SKIP() << "Unimplemented exception: " << e.what(); |
| 3997 | + } |
| 3998 | + catch(ROCFFT_FAIL& e) |
| 3999 | + { |
| 4000 | + GTEST_FAIL() << e.msg; |
| 4001 | } |
| 4002 | break; |
| 4003 | } |
| 4004 | @@ -140,6 +170,11 @@ TEST_P(accuracy_test, vs_fftw) |
| 4005 | SUCCEED(); |
| 4006 | } |
| 4007 | |
| 4008 | +INSTANTIATE_TEST_SUITE_P(DISABLED_symptomatic_tokens, |
| 4009 | + accuracy_test, |
| 4010 | + ::testing::ValuesIn(param_generator_token(test_prob, symptomatic_tokens)), |
| 4011 | + accuracy_test::TestName); |
| 4012 | + |
| 4013 | #ifdef __HIP__ |
| 4014 | |
| 4015 | // load/store callbacks - cbdata in each is actually a scalar double |
| 4016 | @@ -166,6 +201,37 @@ __device__ auto load_callback_dev_complex_float = load_callback<rocfft_complex< |
| 4017 | __device__ auto load_callback_dev_double = load_callback<double>; |
| 4018 | __device__ auto load_callback_dev_complex_double = load_callback<rocfft_complex<double>>; |
| 4019 | |
| 4020 | +// load/store callbacks - cbdata in each is actually a scalar double |
| 4021 | +// with a number to apply to each element |
| 4022 | +template <typename Tdata> |
| 4023 | +__host__ __device__ Tdata |
| 4024 | + load_callback_round_trip_inverse(Tdata* input, size_t offset, void* cbdata, void* sharedMem) |
| 4025 | +{ |
| 4026 | + auto testdata = static_cast<const callback_test_data*>(cbdata); |
| 4027 | + // subtract each element by scalar |
| 4028 | + if(input == testdata->base) |
| 4029 | + return input[offset] - testdata->scalar; |
| 4030 | + // wrong base address passed, return something obviously wrong |
| 4031 | + else |
| 4032 | + { |
| 4033 | + // wrong base address passed, return something obviously wrong |
| 4034 | + return input[0]; |
| 4035 | + } |
| 4036 | +} |
| 4037 | + |
| 4038 | +__device__ auto load_callback_round_trip_inverse_dev_half |
| 4039 | + = load_callback_round_trip_inverse<rocfft_fp16>; |
| 4040 | +__device__ auto load_callback_round_trip_inverse_dev_complex_half |
| 4041 | + = load_callback_round_trip_inverse<rocfft_complex<rocfft_fp16>>; |
| 4042 | +__device__ auto load_callback_round_trip_inverse_dev_float |
| 4043 | + = load_callback_round_trip_inverse<float>; |
| 4044 | +__device__ auto load_callback_round_trip_inverse_dev_complex_float |
| 4045 | + = load_callback_round_trip_inverse<rocfft_complex<float>>; |
| 4046 | +__device__ auto load_callback_round_trip_inverse_dev_double |
| 4047 | + = load_callback_round_trip_inverse<double>; |
| 4048 | +__device__ auto load_callback_round_trip_inverse_dev_complex_double |
| 4049 | + = load_callback_round_trip_inverse<rocfft_complex<double>>; |
| 4050 | + |
| 4051 | void* get_load_callback_host(fft_array_type itype, |
| 4052 | fft_precision precision, |
| 4053 | bool round_trip_inverse = false) |
| 4054 | @@ -179,22 +245,55 @@ void* get_load_callback_host(fft_array_type itype, |
| 4055 | switch(precision) |
| 4056 | { |
| 4057 | case fft_precision_half: |
| 4058 | - EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4059 | - HIP_SYMBOL(load_callback_dev_complex_half), |
| 4060 | - sizeof(void*)), |
| 4061 | - hipSuccess); |
| 4062 | + if(round_trip_inverse) |
| 4063 | + { |
| 4064 | + EXPECT_EQ(hipMemcpyFromSymbol( |
| 4065 | + &load_callback_host, |
| 4066 | + HIP_SYMBOL(load_callback_round_trip_inverse_dev_complex_half), |
| 4067 | + sizeof(void*)), |
| 4068 | + hipSuccess); |
| 4069 | + } |
| 4070 | + else |
| 4071 | + { |
| 4072 | + EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4073 | + HIP_SYMBOL(load_callback_dev_complex_half), |
| 4074 | + sizeof(void*)), |
| 4075 | + hipSuccess); |
| 4076 | + } |
| 4077 | return load_callback_host; |
| 4078 | case fft_precision_single: |
| 4079 | - EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4080 | - HIP_SYMBOL(load_callback_dev_complex_float), |
| 4081 | - sizeof(void*)), |
| 4082 | - hipSuccess); |
| 4083 | + if(round_trip_inverse) |
| 4084 | + { |
| 4085 | + EXPECT_EQ(hipMemcpyFromSymbol( |
| 4086 | + &load_callback_host, |
| 4087 | + HIP_SYMBOL(load_callback_round_trip_inverse_dev_complex_float), |
| 4088 | + sizeof(void*)), |
| 4089 | + hipSuccess); |
| 4090 | + } |
| 4091 | + else |
| 4092 | + { |
| 4093 | + EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4094 | + HIP_SYMBOL(load_callback_dev_complex_float), |
| 4095 | + sizeof(void*)), |
| 4096 | + hipSuccess); |
| 4097 | + } |
| 4098 | return load_callback_host; |
| 4099 | case fft_precision_double: |
| 4100 | - EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4101 | - HIP_SYMBOL(load_callback_dev_complex_double), |
| 4102 | - sizeof(void*)), |
| 4103 | - hipSuccess); |
| 4104 | + if(round_trip_inverse) |
| 4105 | + { |
| 4106 | + EXPECT_EQ(hipMemcpyFromSymbol( |
| 4107 | + &load_callback_host, |
| 4108 | + HIP_SYMBOL(load_callback_round_trip_inverse_dev_complex_double), |
| 4109 | + sizeof(void*)), |
| 4110 | + hipSuccess); |
| 4111 | + } |
| 4112 | + else |
| 4113 | + { |
| 4114 | + EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4115 | + HIP_SYMBOL(load_callback_dev_complex_double), |
| 4116 | + sizeof(void*)), |
| 4117 | + hipSuccess); |
| 4118 | + } |
| 4119 | return load_callback_host; |
| 4120 | } |
| 4121 | } |
| 4122 | @@ -203,19 +302,55 @@ void* get_load_callback_host(fft_array_type itype, |
| 4123 | switch(precision) |
| 4124 | { |
| 4125 | case fft_precision_half: |
| 4126 | - EXPECT_EQ(hipMemcpyFromSymbol( |
| 4127 | - &load_callback_host, HIP_SYMBOL(load_callback_dev_half), sizeof(void*)), |
| 4128 | - hipSuccess); |
| 4129 | + if(round_trip_inverse) |
| 4130 | + { |
| 4131 | + EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4132 | + HIP_SYMBOL(load_callback_round_trip_inverse_dev_half), |
| 4133 | + sizeof(void*)), |
| 4134 | + hipSuccess); |
| 4135 | + } |
| 4136 | + else |
| 4137 | + { |
| 4138 | + EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4139 | + HIP_SYMBOL(load_callback_dev_half), |
| 4140 | + sizeof(void*)), |
| 4141 | + hipSuccess); |
| 4142 | + } |
| 4143 | return load_callback_host; |
| 4144 | case fft_precision_single: |
| 4145 | - EXPECT_EQ(hipMemcpyFromSymbol( |
| 4146 | - &load_callback_host, HIP_SYMBOL(load_callback_dev_float), sizeof(void*)), |
| 4147 | - hipSuccess); |
| 4148 | + if(round_trip_inverse) |
| 4149 | + { |
| 4150 | + EXPECT_EQ( |
| 4151 | + hipMemcpyFromSymbol(&load_callback_host, |
| 4152 | + HIP_SYMBOL(load_callback_round_trip_inverse_dev_float), |
| 4153 | + sizeof(void*)), |
| 4154 | + hipSuccess); |
| 4155 | + } |
| 4156 | + else |
| 4157 | + { |
| 4158 | + EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4159 | + HIP_SYMBOL(load_callback_dev_float), |
| 4160 | + sizeof(void*)), |
| 4161 | + hipSuccess); |
| 4162 | + } |
| 4163 | return load_callback_host; |
| 4164 | case fft_precision_double: |
| 4165 | - EXPECT_EQ(hipMemcpyFromSymbol( |
| 4166 | - &load_callback_host, HIP_SYMBOL(load_callback_dev_double), sizeof(void*)), |
| 4167 | - hipSuccess); |
| 4168 | + |
| 4169 | + if(round_trip_inverse) |
| 4170 | + { |
| 4171 | + EXPECT_EQ( |
| 4172 | + hipMemcpyFromSymbol(&load_callback_host, |
| 4173 | + HIP_SYMBOL(load_callback_round_trip_inverse_dev_double), |
| 4174 | + sizeof(void*)), |
| 4175 | + hipSuccess); |
| 4176 | + } |
| 4177 | + else |
| 4178 | + { |
| 4179 | + EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host, |
| 4180 | + HIP_SYMBOL(load_callback_dev_double), |
| 4181 | + sizeof(void*)), |
| 4182 | + hipSuccess); |
| 4183 | + } |
| 4184 | return load_callback_host; |
| 4185 | } |
| 4186 | } |
| 4187 | @@ -245,6 +380,31 @@ __device__ auto store_callback_dev_complex_float = store_callback<rocfft_comple |
| 4188 | __device__ auto store_callback_dev_double = store_callback<double>; |
| 4189 | __device__ auto store_callback_dev_complex_double = store_callback<rocfft_complex<double>>; |
| 4190 | |
| 4191 | +template <typename Tdata> |
| 4192 | +__host__ __device__ static void store_callback_round_trip_inverse( |
| 4193 | + Tdata* output, size_t offset, Tdata element, void* cbdata, void* sharedMem) |
| 4194 | +{ |
| 4195 | + auto testdata = static_cast<callback_test_data*>(cbdata); |
| 4196 | + // divide each element by scalar |
| 4197 | + if(output == testdata->base) |
| 4198 | + { |
| 4199 | + output[offset] = element / testdata->scalar; |
| 4200 | + } |
| 4201 | + // otherwise, wrong base address passed, just don't write |
| 4202 | +} |
| 4203 | +__device__ auto store_callback_round_trip_inverse_dev_half |
| 4204 | + = store_callback_round_trip_inverse<rocfft_fp16>; |
| 4205 | +__device__ auto store_callback_round_trip_inverse_dev_complex_half |
| 4206 | + = store_callback_round_trip_inverse<rocfft_complex<rocfft_fp16>>; |
| 4207 | +__device__ auto store_callback_round_trip_inverse_dev_float |
| 4208 | + = store_callback_round_trip_inverse<float>; |
| 4209 | +__device__ auto store_callback_round_trip_inverse_dev_complex_float |
| 4210 | + = store_callback_round_trip_inverse<rocfft_complex<float>>; |
| 4211 | +__device__ auto store_callback_round_trip_inverse_dev_double |
| 4212 | + = store_callback_round_trip_inverse<double>; |
| 4213 | +__device__ auto store_callback_round_trip_inverse_dev_complex_double |
| 4214 | + = store_callback_round_trip_inverse<rocfft_complex<double>>; |
| 4215 | + |
| 4216 | void* get_store_callback_host(fft_array_type otype, |
| 4217 | fft_precision precision, |
| 4218 | bool round_trip_inverse = false) |
| 4219 | @@ -258,22 +418,55 @@ void* get_store_callback_host(fft_array_type otype, |
| 4220 | switch(precision) |
| 4221 | { |
| 4222 | case fft_precision_half: |
| 4223 | - EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4224 | - HIP_SYMBOL(store_callback_dev_complex_half), |
| 4225 | - sizeof(void*)), |
| 4226 | - hipSuccess); |
| 4227 | + if(round_trip_inverse) |
| 4228 | + { |
| 4229 | + EXPECT_EQ(hipMemcpyFromSymbol( |
| 4230 | + &store_callback_host, |
| 4231 | + HIP_SYMBOL(store_callback_round_trip_inverse_dev_complex_half), |
| 4232 | + sizeof(void*)), |
| 4233 | + hipSuccess); |
| 4234 | + } |
| 4235 | + else |
| 4236 | + { |
| 4237 | + EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4238 | + HIP_SYMBOL(store_callback_dev_complex_half), |
| 4239 | + sizeof(void*)), |
| 4240 | + hipSuccess); |
| 4241 | + } |
| 4242 | return store_callback_host; |
| 4243 | case fft_precision_single: |
| 4244 | - EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4245 | - HIP_SYMBOL(store_callback_dev_complex_float), |
| 4246 | - sizeof(void*)), |
| 4247 | - hipSuccess); |
| 4248 | + if(round_trip_inverse) |
| 4249 | + { |
| 4250 | + EXPECT_EQ(hipMemcpyFromSymbol( |
| 4251 | + &store_callback_host, |
| 4252 | + HIP_SYMBOL(store_callback_round_trip_inverse_dev_complex_float), |
| 4253 | + sizeof(void*)), |
| 4254 | + hipSuccess); |
| 4255 | + } |
| 4256 | + else |
| 4257 | + { |
| 4258 | + EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4259 | + HIP_SYMBOL(store_callback_dev_complex_float), |
| 4260 | + sizeof(void*)), |
| 4261 | + hipSuccess); |
| 4262 | + } |
| 4263 | return store_callback_host; |
| 4264 | case fft_precision_double: |
| 4265 | - EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4266 | - HIP_SYMBOL(store_callback_dev_complex_double), |
| 4267 | - sizeof(void*)), |
| 4268 | - hipSuccess); |
| 4269 | + if(round_trip_inverse) |
| 4270 | + { |
| 4271 | + EXPECT_EQ(hipMemcpyFromSymbol( |
| 4272 | + &store_callback_host, |
| 4273 | + HIP_SYMBOL(store_callback_round_trip_inverse_dev_complex_double), |
| 4274 | + sizeof(void*)), |
| 4275 | + hipSuccess); |
| 4276 | + } |
| 4277 | + else |
| 4278 | + { |
| 4279 | + EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4280 | + HIP_SYMBOL(store_callback_dev_complex_double), |
| 4281 | + sizeof(void*)), |
| 4282 | + hipSuccess); |
| 4283 | + } |
| 4284 | return store_callback_host; |
| 4285 | } |
| 4286 | } |
| 4287 | @@ -282,21 +475,55 @@ void* get_store_callback_host(fft_array_type otype, |
| 4288 | switch(precision) |
| 4289 | { |
| 4290 | case fft_precision_half: |
| 4291 | - EXPECT_EQ(hipMemcpyFromSymbol( |
| 4292 | - &store_callback_host, HIP_SYMBOL(store_callback_dev_half), sizeof(void*)), |
| 4293 | - hipSuccess); |
| 4294 | + if(round_trip_inverse) |
| 4295 | + { |
| 4296 | + EXPECT_EQ( |
| 4297 | + hipMemcpyFromSymbol(&store_callback_host, |
| 4298 | + HIP_SYMBOL(store_callback_round_trip_inverse_dev_half), |
| 4299 | + sizeof(void*)), |
| 4300 | + hipSuccess); |
| 4301 | + } |
| 4302 | + else |
| 4303 | + { |
| 4304 | + EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4305 | + HIP_SYMBOL(store_callback_dev_half), |
| 4306 | + sizeof(void*)), |
| 4307 | + hipSuccess); |
| 4308 | + } |
| 4309 | return store_callback_host; |
| 4310 | case fft_precision_single: |
| 4311 | - EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4312 | - HIP_SYMBOL(store_callback_dev_float), |
| 4313 | - sizeof(void*)), |
| 4314 | - hipSuccess); |
| 4315 | + if(round_trip_inverse) |
| 4316 | + { |
| 4317 | + EXPECT_EQ( |
| 4318 | + hipMemcpyFromSymbol(&store_callback_host, |
| 4319 | + HIP_SYMBOL(store_callback_round_trip_inverse_dev_float), |
| 4320 | + sizeof(void*)), |
| 4321 | + hipSuccess); |
| 4322 | + } |
| 4323 | + else |
| 4324 | + { |
| 4325 | + EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4326 | + HIP_SYMBOL(store_callback_dev_float), |
| 4327 | + sizeof(void*)), |
| 4328 | + hipSuccess); |
| 4329 | + } |
| 4330 | return store_callback_host; |
| 4331 | case fft_precision_double: |
| 4332 | - EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4333 | - HIP_SYMBOL(store_callback_dev_double), |
| 4334 | - sizeof(void*)), |
| 4335 | - hipSuccess); |
| 4336 | + if(round_trip_inverse) |
| 4337 | + { |
| 4338 | + EXPECT_EQ( |
| 4339 | + hipMemcpyFromSymbol(&store_callback_host, |
| 4340 | + HIP_SYMBOL(store_callback_round_trip_inverse_dev_double), |
| 4341 | + sizeof(void*)), |
| 4342 | + hipSuccess); |
| 4343 | + } |
| 4344 | + else |
| 4345 | + { |
| 4346 | + EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host, |
| 4347 | + HIP_SYMBOL(store_callback_dev_double), |
| 4348 | + sizeof(void*)), |
| 4349 | + hipSuccess); |
| 4350 | + } |
| 4351 | return store_callback_host; |
| 4352 | } |
| 4353 | } |
| 4354 | diff --git a/clients/tests/hipfft_mpi_worker.cpp b/clients/tests/hipfft_mpi_worker.cpp |
| 4355 | index 9a1e219..a4b85c6 100644 |
| 4356 | --- a/clients/tests/hipfft_mpi_worker.cpp |
| 4357 | +++ b/clients/tests/hipfft_mpi_worker.cpp |
| 4358 | @@ -23,6 +23,9 @@ |
| 4359 | #include "../../shared/mpi_worker.h" |
| 4360 | #include "../hipfft_params.h" |
| 4361 | |
| 4362 | +// initialize static class member of hipfft_params |
| 4363 | +std::vector<gpubuf> hipfft_params::externally_managed_workareas = std::vector<gpubuf>(); |
| 4364 | + |
| 4365 | int main(int argc, char* argv[]) |
| 4366 | { |
| 4367 | return mpi_worker_main<std::array<hipfft_params, 1>, false>( |
| 4368 | diff --git a/clients/tests/hipfft_test_params.h b/clients/tests/hipfft_test_params.h |
| 4369 | deleted file mode 100644 |
| 4370 | index 4d22fbb..0000000 |
| 4371 | --- a/clients/tests/hipfft_test_params.h |
| 4372 | +++ /dev/null |
| 4373 | @@ -1,32 +0,0 @@ |
| 4374 | -// Copyright (C) 2022 - 2022 Advanced Micro Devices, Inc. All rights reserved. |
| 4375 | -// |
| 4376 | -// Permission is hereby granted, free of charge, to any person obtaining a copy |
| 4377 | -// of this software and associated documentation files (the "Software"), to deal |
| 4378 | -// in the Software without restriction, including without limitation the rights |
| 4379 | -// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
| 4380 | -// copies of the Software, and to permit persons to whom the Software is |
| 4381 | -// furnished to do so, subject to the following conditions: |
| 4382 | -// |
| 4383 | -// The above copyright notice and this permission notice shall be included in |
| 4384 | -// all copies or substantial portions of the Software. |
| 4385 | -// |
| 4386 | -// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| 4387 | -// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| 4388 | -// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| 4389 | -// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| 4390 | -// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| 4391 | -// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
| 4392 | -// THE SOFTWARE. |
| 4393 | - |
| 4394 | -#pragma once |
| 4395 | -#ifndef TESTCONSTANTS_H |
| 4396 | -#define TESTCONSTANTS_H |
| 4397 | - |
| 4398 | -#include "hipfft/hipfft.h" |
| 4399 | - |
| 4400 | -#include <stdexcept> |
| 4401 | - |
| 4402 | -extern int verbose; |
| 4403 | -extern size_t ramgb; |
| 4404 | - |
| 4405 | -#endif |
| 4406 | diff --git a/clients/tests/hipfftw_test.cpp b/clients/tests/hipfftw_test.cpp |
| 4407 | new file mode 100644 |
| 4408 | index 0000000..b0ad0b0 |
| 4409 | --- /dev/null |
| 4410 | +++ b/clients/tests/hipfftw_test.cpp |
| 4411 | @@ -0,0 +1,2214 @@ |
| 4412 | +// Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved. |
| 4413 | +// |
| 4414 | +// Permission is hereby granted, free of charge, to any person obtaining a copy |
| 4415 | +// of this software and associated documentation files (the "Software"), to deal |
| 4416 | +// in the Software without restriction, including without limitation the rights |
| 4417 | +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
| 4418 | +// copies of the Software, and to permit persons to whom the Software is |
| 4419 | +// furnished to do so, subject to the following conditions: |
| 4420 | +// |
| 4421 | +// The above copyright notice and this permission notice shall be included in |
| 4422 | +// all copies or substantial portions of the Software. |
| 4423 | +// |
| 4424 | +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
| 4425 | +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| 4426 | +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
| 4427 | +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
| 4428 | +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
| 4429 | +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN |
| 4430 | +// THE SOFTWARE. |
| 4431 | + |
| 4432 | +#include "../hipfftw_helper.h" |
| 4433 | + |
| 4434 | +#include "../../shared/environment.h" |
| 4435 | +#include "../../shared/gpubuf.h" |
| 4436 | +#include "../../shared/hostbuf.h" |
| 4437 | +#include "../../shared/params_gen.h" |
| 4438 | +#include "../../shared/test_params.h" |
| 4439 | + |
| 4440 | +#include <cstdint> |
| 4441 | +#include <cstring> |
| 4442 | +#include <fftw3.h> |
| 4443 | +#include <future> |
| 4444 | +#include <gtest/gtest.h> |
| 4445 | +#include <memory> |
| 4446 | +#include <numeric> |
| 4447 | + |
| 4448 | +#ifdef WIN32 |
| 4449 | +#include <windows.h> |
| 4450 | +#else |
| 4451 | +#include <cstdlib> |
| 4452 | +#endif |
| 4453 | + |
| 4454 | +#ifdef _OPENMP |
| 4455 | +#include <omp.h> |
| 4456 | +#endif |
| 4457 | + |
| 4458 | +// test details |
| 4459 | +namespace |
| 4460 | +{ |
| 4461 | + // |
| 4462 | + //--------------------------------------------------------------------------------------------- |
| 4463 | + // COMMONS AND HELPERS |
| 4464 | + //--------------------------------------------------------------------------------------------- |
| 4465 | + // |
| 4466 | + |
| 4467 | + size_t max_byte_size_for_hipfftw_tests() |
| 4468 | + { |
| 4469 | + auto get_io_byte_size_limit = []() { |
| 4470 | + size_t tmp = vramgb * ONE_GiB; |
| 4471 | + if(tmp == 0) |
| 4472 | + { |
| 4473 | + size_t free = 0, total = 0; |
| 4474 | + if(hipMemGetInfo(&free, &total) == hipSuccess) |
| 4475 | + tmp = total / 8; |
| 4476 | + } |
| 4477 | + tmp = std::min(tmp, ramgb * ONE_GiB); |
| 4478 | + tmp = std::min(tmp, max_io_gb_for_hipfftw_test * ONE_GiB); |
| 4479 | + if(verbose > 0) |
| 4480 | + { |
| 4481 | + std::cout << "Limit for the size of I/O data used in hipfftw tests: "; |
| 4482 | + if(tmp >= ONE_GiB) |
| 4483 | + std::cout << float(tmp) / ONE_GiB << " GiB." << std ::endl; |
| 4484 | + else if(tmp >= ONE_MiB) |
| 4485 | + std::cout << float(tmp) / ONE_MiB << " MiB." << std ::endl; |
| 4486 | + else |
| 4487 | + std::cout << float(tmp) / ONE_KiB << " KiB." << std ::endl; |
| 4488 | + } |
| 4489 | + return tmp; |
| 4490 | + }; |
| 4491 | + static const size_t io_byte_size_limit = get_io_byte_size_limit(); |
| 4492 | + return io_byte_size_limit; |
| 4493 | + } |
| 4494 | + |
| 4495 | + std::ranlux24_base& get_pseudo_rng() |
| 4496 | + { |
| 4497 | + static std::ranlux24_base gen(random_seed); |
| 4498 | + return gen; |
| 4499 | + } |
| 4500 | + |
| 4501 | + // NOTE: this function makes use of comparison operator < and != which must be defined for the |
| 4502 | + // specialization type T |
| 4503 | + template <typename T> |
| 4504 | + void insert_into_unique_sorted_params(std::vector<T>& unique_sorted_params, |
| 4505 | + const T& param_to_insert) |
| 4506 | + { |
| 4507 | + auto it = std::lower_bound( |
| 4508 | + unique_sorted_params.begin(), unique_sorted_params.end(), param_to_insert); |
| 4509 | + if(it != unique_sorted_params.end() && *it == param_to_insert) |
| 4510 | + return; // it's already in there generated |
| 4511 | + unique_sorted_params.insert(it, param_to_insert); |
| 4512 | + } |
| 4513 | + |
| 4514 | + enum class hipfftw_internal_exception |
| 4515 | + { |
| 4516 | + none, |
| 4517 | + flow_redirection, |
| 4518 | + invalid_args, |
| 4519 | + unsupported_args, |
| 4520 | + ill_defined |
| 4521 | + }; |
| 4522 | + |
| 4523 | + // for well-defined internal exceptions, we may expect a specific report thereof |
| 4524 | + // in the hipfftw exception log (if logging is activated) |
| 4525 | + template <hipfftw_internal_exception> |
| 4526 | + constexpr std::string_view hipfftw_expected_log_instance; |
| 4527 | + template <> |
| 4528 | + constexpr std::string_view hipfftw_expected_log_instance< |
| 4529 | + hipfftw_internal_exception::invalid_args> = R"(Invalid argument reported)"; |
| 4530 | + template <> |
| 4531 | + constexpr std::string_view hipfftw_expected_log_instance< |
| 4532 | + hipfftw_internal_exception::unsupported_args> = R"(Unsupported usage reported)"; |
| 4533 | + template <> |
| 4534 | + constexpr std::string_view hipfftw_expected_log_instance< |
| 4535 | + hipfftw_internal_exception::flow_redirection> = R"(Redirecting execution flow)"; |
| 4536 | + |
| 4537 | + // randomizers |
| 4538 | + // Note: albeit not supported, ranks > 3 are "valid" rank argument |
| 4539 | + // --> limiting rank value to max of 10 by default to avoid ridiculously long |
| 4540 | + // lengths possibly created in automated parameter generations; |
| 4541 | + template <bool validity_flag, |
| 4542 | + int min_rank = validity_flag ? 1 : std::numeric_limits<int>::lowest(), |
| 4543 | + int max_rank = validity_flag ? 10 : 0, |
| 4544 | + std::enable_if_t<(min_rank <= max_rank) && (!validity_flag || min_rank > 0) |
| 4545 | + && (validity_flag || max_rank <= 0), |
| 4546 | + bool> = true> |
| 4547 | + int get_random_rank() |
| 4548 | + { |
| 4549 | + static std::uniform_int_distribution<int> rank_rng(min_rank, max_rank); |
| 4550 | + |
| 4551 | + auto ret = rank_rng(get_pseudo_rng()); |
| 4552 | + if(rank_is_valid_for_hipfftw(ret) != validity_flag) |
| 4553 | + { |
| 4554 | + throw std::runtime_error( |
| 4555 | + "failed to generate a rank value of desired validity randomly"); |
| 4556 | + } |
| 4557 | + return ret; |
| 4558 | + } |
| 4559 | + template <bool validity_flag, typename type_to_consider_for_validity> |
| 4560 | + std::vector<ptrdiff_t> |
| 4561 | + get_random_lengths(int desired_rank, |
| 4562 | + ptrdiff_t max_abs_len |
| 4563 | + = std::numeric_limits<type_to_consider_for_validity>::max(), |
| 4564 | + ptrdiff_t min_abs_len = 0) |
| 4565 | + { |
| 4566 | + std::vector<ptrdiff_t> ret; |
| 4567 | + // cannot generate lengths for invalid ranks --> return empty lengths in that case |
| 4568 | + if(!rank_is_valid_for_hipfftw(desired_rank)) |
| 4569 | + return ret; |
| 4570 | + if(min_abs_len < 0 || min_abs_len > max_abs_len) |
| 4571 | + throw std::invalid_argument("invalid bounds used for get_random_lengths"); |
| 4572 | + // generate values that are all representable as integers |
| 4573 | + auto& pseudo_rng = get_pseudo_rng(); |
| 4574 | + std::uniform_int_distribution<ptrdiff_t> length_rng(min_abs_len, max_abs_len); |
| 4575 | + // setter lambda |
| 4576 | + auto set_random_len = [&]() { |
| 4577 | + for(auto& l : ret) |
| 4578 | + { |
| 4579 | + const ptrdiff_t val = length_rng(pseudo_rng); |
| 4580 | + if constexpr(validity_flag) |
| 4581 | + l = val; |
| 4582 | + else |
| 4583 | + { |
| 4584 | + if(pseudo_rng() % 2) |
| 4585 | + l = -val; |
| 4586 | + else |
| 4587 | + l = val; |
| 4588 | + } |
| 4589 | + } |
| 4590 | + }; |
| 4591 | + |
| 4592 | + ret.resize(desired_rank); |
| 4593 | + set_random_len(); |
| 4594 | + while(lengths_are_valid_for_hipfftw_as<type_to_consider_for_validity>(ret, desired_rank) |
| 4595 | + != validity_flag) |
| 4596 | + set_random_len(); |
| 4597 | + |
| 4598 | + return ret; |
| 4599 | + } |
| 4600 | + template <bool validity_flag> |
| 4601 | + int get_random_sign(fft_transform_type intended_dft_kind) |
| 4602 | + { |
| 4603 | + if(!validity_flag && is_real(intended_dft_kind)) |
| 4604 | + throw std::invalid_argument("An invalid sign cannot be generated for real transforms " |
| 4605 | + "(sign is ignored in that case)"); |
| 4606 | + std::uniform_int_distribution<int> sign_rng(std::numeric_limits<int>::lowest(), |
| 4607 | + std::numeric_limits<int>::max()); |
| 4608 | + |
| 4609 | + int tmp = validity_flag && is_complex(intended_dft_kind) |
| 4610 | + ? (is_fwd(intended_dft_kind) ? FFTW_FORWARD : FFTW_BACKWARD) |
| 4611 | + : sign_rng(get_pseudo_rng()); |
| 4612 | + |
| 4613 | + while(sign_is_valid_for_hipfftw(tmp, intended_dft_kind) != validity_flag) |
| 4614 | + tmp = sign_rng(get_pseudo_rng()); |
| 4615 | + return tmp; |
| 4616 | + } |
| 4617 | + template <bool validity_flag> |
| 4618 | + unsigned get_random_flags() |
| 4619 | + { |
| 4620 | + std::uniform_int_distribution<unsigned> flags_rng(std::numeric_limits<unsigned>::lowest(), |
| 4621 | + std::numeric_limits<unsigned>::max()); |
| 4622 | + |
| 4623 | + auto tmp = flags_rng(get_pseudo_rng()); |
| 4624 | + if constexpr(validity_flag) |
| 4625 | + { |
| 4626 | + tmp &= hipfftw_valid_flags_mask; |
| 4627 | + if(!flags_are_valid_for_hipfftw(tmp)) |
| 4628 | + throw std::runtime_error("failed to create random valid flags"); |
| 4629 | + return tmp; |
| 4630 | + } |
| 4631 | + while(flags_are_valid_for_hipfftw(tmp)) |
| 4632 | + tmp = flags_rng(get_pseudo_rng()); |
| 4633 | + return tmp; |
| 4634 | + } |
| 4635 | + size_t get_random_idx(size_t upper_bound) |
| 4636 | + { |
| 4637 | + if(upper_bound == 0) |
| 4638 | + throw std::invalid_argument("upper_bound must be strictly positive for get_random_idx"); |
| 4639 | + std::uniform_int_distribution<size_t> idx_rng(0, upper_bound - 1); |
| 4640 | + return idx_rng(get_pseudo_rng()); |
| 4641 | + } |
| 4642 | + |
| 4643 | + // calculates the threshold value X such that max_data_idx is no greater |
| 4644 | + // (resp. larger) than num_elems, if all elements of lengths are all no greater |
| 4645 | + // (resp. larger) than X, and lengths.size() == rank [using bisection] |
| 4646 | + ptrdiff_t get_len_threshold(size_t num_elems, int rank, bool is_real_inplace) |
| 4647 | + { |
| 4648 | + if(rank < 1) |
| 4649 | + throw std::invalid_argument("invalid rank used in get_len_threshold"); |
| 4650 | + if(num_elems == 0) |
| 4651 | + return 1; |
| 4652 | + constexpr ptrdiff_t X_max = std::numeric_limits<ptrdiff_t>::max(); |
| 4653 | + // we need to find X in [0, X_max] s.t. |
| 4654 | + // largest_idx(X) <= num_elems && largest_idx(X + 1) > num_elems |
| 4655 | + auto largest_idx = [&](ptrdiff_t X) { |
| 4656 | + size_t ret = rank > 1 && is_real_inplace ? 2 * (X / 2 + 1) : X; |
| 4657 | + for(auto i = 1; i < rank; i++) |
| 4658 | + ret *= X; |
| 4659 | + return ret; |
| 4660 | + }; |
| 4661 | + // initialization |
| 4662 | + ptrdiff_t X_down |
| 4663 | + = rank == 1 ? static_cast<ptrdiff_t>(num_elems) |
| 4664 | + : static_cast<ptrdiff_t>(std::floor(std::pow(num_elems, 1.0 / rank))); |
| 4665 | + ptrdiff_t diff = 1; |
| 4666 | + ptrdiff_t X_up = X_down; |
| 4667 | + while(largest_idx(X_up) <= num_elems && X_up < X_max) |
| 4668 | + { |
| 4669 | + X_down = X_up; |
| 4670 | + X_up = X_up <= X_max - diff ? X_up + diff : X_max; |
| 4671 | + diff *= 2; |
| 4672 | + } |
| 4673 | + diff = 1; |
| 4674 | + while(largest_idx(X_down) > num_elems && X_down > 0) |
| 4675 | + { |
| 4676 | + X_up = X_down; |
| 4677 | + X_down = X_down >= diff ? X_down - diff : 0; |
| 4678 | + diff *= 2; |
| 4679 | + } |
| 4680 | + // bisection |
| 4681 | + while(X_up - X_down > 1) |
| 4682 | + { |
| 4683 | + const auto tmp = (X_up + X_down) / 2; |
| 4684 | + if(largest_idx(tmp) <= num_elems) |
| 4685 | + X_down = tmp; |
| 4686 | + else |
| 4687 | + X_up = tmp; |
| 4688 | + } |
| 4689 | + return X_down; |
| 4690 | + } |
| 4691 | + |
| 4692 | + template <fft_precision prec> |
| 4693 | + size_t max_num_elems_for_data_size(size_t data_byte_size, fft_transform_type dft_kind) |
| 4694 | + { |
| 4695 | + return data_byte_size |
| 4696 | + / (is_real(dft_kind) ? sizeof(hipfftw_real_t<prec>) |
| 4697 | + : sizeof(hipfftw_complex_t<prec>)); |
| 4698 | + } |
| 4699 | + |
| 4700 | + // exception for hip runtime error(s) specifically |
| 4701 | + struct hip_runtime_error : public std::runtime_error |
| 4702 | + { |
| 4703 | + const hipError_t hip_error; |
| 4704 | + hip_runtime_error(const std::string& info, hipError_t hip_status) |
| 4705 | + : std::runtime_error::runtime_error(info) |
| 4706 | + , hip_error(hip_status) |
| 4707 | + |
| 4708 | + { |
| 4709 | + } |
| 4710 | + }; |
| 4711 | + int get_current_device_id() |
| 4712 | + { |
| 4713 | + int ret = hipInvalidDeviceId; |
| 4714 | + const auto hip_status = hipGetDevice(&ret); |
| 4715 | + if(hip_status != hipSuccess) |
| 4716 | + throw hip_runtime_error("hipGetDevice failed.", hip_status); |
| 4717 | + return ret; |
| 4718 | + } |
| 4719 | + |
| 4720 | + // |
| 4721 | + //--------------------------------------------------------------------------------------------- |
| 4722 | + // EXISTENCE OF UTILITY FUNCTIONS |
| 4723 | + //--------------------------------------------------------------------------------------------- |
| 4724 | + // |
| 4725 | + |
| 4726 | + template <fft_precision prec> |
| 4727 | + void test_existence_of_utility_functions() |
| 4728 | + { |
| 4729 | + try |
| 4730 | + { |
| 4731 | + // call utility functions - they need to exist but don't need to work |
| 4732 | + const auto& hipfftw_ = hipfftw_funcs<prec>::get_instance(); |
| 4733 | + hipfftw_.print_plan(nullptr); |
| 4734 | + hipfftw_.set_timelimit(0.0); |
| 4735 | + hipfftw_.cost(nullptr); |
| 4736 | + hipfftw_.flops(nullptr, nullptr, nullptr, nullptr); |
| 4737 | + hipfftw_.cleanup(); |
| 4738 | + } |
| 4739 | + catch(const hipfftw_undefined_function_ptr& e) |
| 4740 | + { |
| 4741 | + GTEST_FAIL() << "Undefined function pointers detected. Error info: " << e.what(); |
| 4742 | + } |
| 4743 | + catch(...) |
| 4744 | + { |
| 4745 | + GTEST_FAIL() << "Unexpected failure"; |
| 4746 | + } |
| 4747 | + } |
| 4748 | + |
| 4749 | + // |
| 4750 | + //--------------------------------------------------------------------------------------------- |
| 4751 | + // ALLOCATION AND FREE |
| 4752 | + //--------------------------------------------------------------------------------------------- |
| 4753 | + // |
| 4754 | + |
| 4755 | + enum class hipfftw_alloc_func_type |
| 4756 | + { |
| 4757 | + unspecified, |
| 4758 | + real, |
| 4759 | + complex |
| 4760 | + }; |
| 4761 | + bool hipfftw_alloc_func_type_is_valid(hipfftw_alloc_func_type func) |
| 4762 | + { |
| 4763 | + return func == hipfftw_alloc_func_type::unspecified || func == hipfftw_alloc_func_type::real |
| 4764 | + || func == hipfftw_alloc_func_type::complex; |
| 4765 | + } |
| 4766 | + // bit mask to prevent allocation kind(s), by increasing "rank" to enable meaningful |
| 4767 | + // comparison operators (implicitly defined based on the underlying type) |
| 4768 | + enum hipfftw_alloc_memkind : unsigned |
| 4769 | + { |
| 4770 | + none = 0x0, |
| 4771 | + pageable_host = 0x1 << 0, |
| 4772 | + pinned_host = 0x1 << 1, |
| 4773 | + any = pageable_host | pinned_host |
| 4774 | + }; |
| 4775 | + |
| 4776 | + const std::vector<hipfftw_alloc_memkind> hipfftw_possible_memkinds |
| 4777 | + = {pinned_host, pageable_host}; |
| 4778 | + |
| 4779 | + bool hipfftw_alloc_kind_is_valid(hipfftw_alloc_memkind kind) |
| 4780 | + { |
| 4781 | + return kind == (kind & hipfftw_alloc_memkind::any); |
| 4782 | + } |
| 4783 | + std::string hipfftw_alloc_kind_to_string(hipfftw_alloc_memkind kind) |
| 4784 | + { |
| 4785 | + if(!hipfftw_alloc_kind_is_valid(kind)) |
| 4786 | + throw std::invalid_argument("alloc_kind_to_string: invalid kind"); |
| 4787 | + if(kind == hipfftw_alloc_memkind::none) |
| 4788 | + return "none"; |
| 4789 | + if(std::find(hipfftw_possible_memkinds.begin(), hipfftw_possible_memkinds.end(), kind) |
| 4790 | + == hipfftw_possible_memkinds.end()) |
| 4791 | + { |
| 4792 | + // several values enabled |
| 4793 | + std::string ret; |
| 4794 | + for(auto to_consider : hipfftw_possible_memkinds) |
| 4795 | + { |
| 4796 | + if(!(kind & to_consider)) |
| 4797 | + continue; |
| 4798 | + if(!ret.empty()) |
| 4799 | + ret += "_or_"; |
| 4800 | + ret += hipfftw_alloc_kind_to_string(to_consider); |
| 4801 | + } |
| 4802 | + return ret; |
| 4803 | + } |
| 4804 | + // kind is a well-defined value in hipfftw_possible_memkinds |
| 4805 | + switch(kind) |
| 4806 | + { |
| 4807 | + case hipfftw_alloc_memkind::pinned_host: |
| 4808 | + return "pinned_host"; |
| 4809 | + break; |
| 4810 | + case hipfftw_alloc_memkind::pageable_host: |
| 4811 | + return "pageable_host"; |
| 4812 | + break; |
| 4813 | + default: |
| 4814 | + throw std::runtime_error("alloc_kind_to_string: internal error encountered " |
| 4815 | + "(unexpected value for kind)"); |
| 4816 | + break; |
| 4817 | + } |
| 4818 | + // unreachable |
| 4819 | + } |
| 4820 | + |
| 4821 | + template <fft_precision prec> |
| 4822 | + struct hipfftw_malloc_params |
| 4823 | + { |
| 4824 | + size_t alloc_arg; |
| 4825 | + hipfftw_alloc_func_type alloc_func; |
| 4826 | + hipfftw_alloc_memkind alloc_kind; |
| 4827 | + |
| 4828 | + size_t get_byte_size() const |
| 4829 | + { |
| 4830 | + return alloc_arg |
| 4831 | + * (alloc_func == hipfftw_alloc_func_type::unspecified |
| 4832 | + ? sizeof(char) |
| 4833 | + : (alloc_func == hipfftw_alloc_func_type::real |
| 4834 | + ? sizeof(hipfftw_real_t<prec>) |
| 4835 | + : sizeof(hipfftw_complex_t<prec>))); |
| 4836 | + } |
| 4837 | + |
| 4838 | + std::string to_string() const |
| 4839 | + { |
| 4840 | + if(!hipfftw_alloc_func_type_is_valid(alloc_func)) |
| 4841 | + throw std::runtime_error("invalid type of allocation function"); |
| 4842 | + if(!hipfftw_alloc_kind_is_valid(alloc_kind)) |
| 4843 | + throw std::runtime_error("invalid allocation kind(s)"); |
| 4844 | + |
| 4845 | + std::string ret; |
| 4846 | + if constexpr(prec == fft_precision_single) |
| 4847 | + ret += "fftwf_"; |
| 4848 | + else |
| 4849 | + ret += "fftw_"; |
| 4850 | + |
| 4851 | + if(alloc_func == hipfftw_alloc_func_type::unspecified) |
| 4852 | + ret += "malloc_"; |
| 4853 | + else if(alloc_func == hipfftw_alloc_func_type::real) |
| 4854 | + ret += "alloc_real_"; |
| 4855 | + else |
| 4856 | + ret += "alloc_complex_"; |
| 4857 | + ret += std::to_string(alloc_arg); |
| 4858 | + ret += "_alloc_kind_" + hipfftw_alloc_kind_to_string(alloc_kind); |
| 4859 | + return ret; |
| 4860 | + } |
| 4861 | + // for using with insert_into_unique_sorted_params |
| 4862 | + bool operator<(const hipfftw_malloc_params& other) const |
| 4863 | + { |
| 4864 | + return to_string() < other.to_string(); |
| 4865 | + } |
| 4866 | + bool operator==(const hipfftw_malloc_params& other) const |
| 4867 | + { |
| 4868 | + return to_string() == other.to_string(); |
| 4869 | + } |
| 4870 | + }; |
| 4871 | + |
| 4872 | + template <fft_precision prec> |
| 4873 | + std::vector<hipfftw_malloc_params<prec>> params_for_testing_hipfftw_malloc() |
| 4874 | + { |
| 4875 | + std::vector<hipfftw_malloc_params<prec>> ret; |
| 4876 | + // testing argument value 0 and a randomly chosen one (max 64MiB in byte size, arbitrarily chosen) |
| 4877 | + constexpr size_t max_test_alloc_size = 1ULL << 26; |
| 4878 | + |
| 4879 | + const std::vector<hipfftw_alloc_func_type> func_range |
| 4880 | + = {hipfftw_alloc_func_type::unspecified, |
| 4881 | + hipfftw_alloc_func_type::real, |
| 4882 | + hipfftw_alloc_func_type::complex}; |
| 4883 | + std::vector<hipfftw_alloc_memkind> memkind_range; |
| 4884 | + // add all possible combinations of memory kinds: |
| 4885 | + for(auto kind = static_cast<std::underlying_type<hipfftw_alloc_memkind>::type>( |
| 4886 | + hipfftw_alloc_memkind::none); |
| 4887 | + kind <= static_cast<std::underlying_type<hipfftw_alloc_memkind>::type>( |
| 4888 | + hipfftw_alloc_memkind::any); |
| 4889 | + kind++) |
| 4890 | + { |
| 4891 | + memkind_range.push_back(static_cast<hipfftw_alloc_memkind>(kind)); |
| 4892 | + } |
| 4893 | + |
| 4894 | + hipfftw_malloc_params<prec> to_add; |
| 4895 | + for(auto func : func_range) |
| 4896 | + { |
| 4897 | + size_t max_arg = max_test_alloc_size; |
| 4898 | + if(func == hipfftw_alloc_func_type::real) |
| 4899 | + max_arg /= sizeof(hipfftw_real_t<prec>); |
| 4900 | + else if(func == hipfftw_alloc_func_type::complex) |
| 4901 | + max_arg /= sizeof(hipfftw_complex_t<prec>); |
| 4902 | + std::uniform_int_distribution<size_t> arg_rng(1, max_arg); |
| 4903 | + for(auto kind : memkind_range) |
| 4904 | + { |
| 4905 | + for(auto arg : {size_t(0), arg_rng(get_pseudo_rng())}) |
| 4906 | + { |
| 4907 | + to_add.alloc_arg = arg; |
| 4908 | + to_add.alloc_func = func; |
| 4909 | + to_add.alloc_kind = kind; |
| 4910 | + insert_into_unique_sorted_params(ret, to_add); |
| 4911 | + } |
| 4912 | + } |
| 4913 | + } |
| 4914 | + return ret; |
| 4915 | + } |
| 4916 | + |
| 4917 | + template <fft_precision prec> |
| 4918 | + class hipfftw_allocation_test : public ::testing::TestWithParam<hipfftw_malloc_params<prec>> |
| 4919 | + { |
| 4920 | + protected: |
| 4921 | + void* test_allocation = nullptr; |
| 4922 | + bool expect_no_allocation; |
| 4923 | + std::map<hipfftw_alloc_memkind, std::unique_ptr<EnvironmentSetTemp>> temp_alloc_limit_env; |
| 4924 | + |
| 4925 | + void SetUp() override |
| 4926 | + { |
| 4927 | + if(test_allocation) |
| 4928 | + GTEST_FAIL() << "Starting from an unclean slate (test_allocation is not nullptr)"; |
| 4929 | + const hipfftw_malloc_params<prec>& params = this->GetParam(); |
| 4930 | + // check validity of params |
| 4931 | + if(!hipfftw_alloc_kind_is_valid(params.alloc_kind)) |
| 4932 | + GTEST_FAIL() << "invalid value for allocation kind"; |
| 4933 | + if(!hipfftw_alloc_func_type_is_valid(params.alloc_func)) |
| 4934 | + GTEST_FAIL() << "unknown allocation function"; |
| 4935 | + |
| 4936 | + size_t limit_for_alloc_kind = 0; |
| 4937 | + for(auto alloc_kind_candidate : hipfftw_possible_memkinds) |
| 4938 | + { |
| 4939 | + if(alloc_kind_candidate != hipfftw_alloc_memkind::pinned_host |
| 4940 | + && alloc_kind_candidate != hipfftw_alloc_memkind::pageable_host) |
| 4941 | + { |
| 4942 | + throw std::runtime_error("unexpected memory allocation kind " |
| 4943 | + + hipfftw_alloc_kind_to_string(alloc_kind_candidate)); |
| 4944 | + } |
| 4945 | + const std::string control_env_var |
| 4946 | + = alloc_kind_candidate == hipfftw_alloc_memkind::pinned_host |
| 4947 | + ? "HIPFFTW_BYTE_SIZE_LIMIT_PINNED_HOST_ALLOC" |
| 4948 | + : "HIPFFTW_BYTE_SIZE_LIMIT_PAGEABLE_HOST_ALLOC"; |
| 4949 | + |
| 4950 | + if(alloc_kind_candidate & params.alloc_kind) |
| 4951 | + { |
| 4952 | + const auto test_user_limit = rocfft_getenv(control_env_var.c_str()); |
| 4953 | + limit_for_alloc_kind |
| 4954 | + = std::max(limit_for_alloc_kind, |
| 4955 | + test_user_limit.empty() ? std::numeric_limits<size_t>::max() |
| 4956 | + : size_t(std::stoull(test_user_limit))); |
| 4957 | + } |
| 4958 | + else |
| 4959 | + { |
| 4960 | + // disable the other possible allocation kind(s) by temporarily |
| 4961 | + // setting the corresponding byte size limit to 0 |
| 4962 | + temp_alloc_limit_env[alloc_kind_candidate] |
| 4963 | + = std::make_unique<EnvironmentSetTemp>(control_env_var.c_str(), "0"); |
| 4964 | + // skip if temporary limit(s) was(were) not successfully set |
| 4965 | + const auto tmp_limit = rocfft_getenv(control_env_var.c_str()); |
| 4966 | + if(tmp_limit.empty() || std::stoull(tmp_limit) != 0) |
| 4967 | + { |
| 4968 | + GTEST_SKIP() << "failed to set environment variable disabling " |
| 4969 | + << hipfftw_alloc_kind_to_string(alloc_kind_candidate) |
| 4970 | + << " allocation(s) by hipFFTW"; |
| 4971 | + } |
| 4972 | + } |
| 4973 | + } |
| 4974 | + const size_t req_byte_size = params.get_byte_size(); |
| 4975 | + expect_no_allocation = params.alloc_kind == hipfftw_alloc_memkind::none |
| 4976 | + || req_byte_size == 0 || req_byte_size > limit_for_alloc_kind; |
| 4977 | + } |
| 4978 | + void TearDown() override |
| 4979 | + { |
| 4980 | + temp_alloc_limit_env.clear(); |
| 4981 | + const hipfftw_funcs<prec>& hipfftw_impl = hipfftw_funcs<prec>::get_instance(); |
| 4982 | + if(test_allocation && !hipfftw_impl.free.may_be_used()) |
| 4983 | + GTEST_FAIL() << "An allocation was created but it can't be freed"; |
| 4984 | + // note: free should be stable even with nullptr |
| 4985 | + if(hipfftw_impl.free.may_be_used()) |
| 4986 | + hipfftw_impl.free(test_allocation); |
| 4987 | + } |
| 4988 | + |
| 4989 | + void test_malloc_write_and_read() |
| 4990 | + { |
| 4991 | + const hipfftw_malloc_params<prec>& params = this->GetParam(); |
| 4992 | + const hipfftw_funcs<prec>& hipfftw_impl = hipfftw_funcs<prec>::get_instance(); |
| 4993 | + hipfftw_exception_logger exception_logger; |
| 4994 | + |
| 4995 | + struct allocation_test_to_be_skipped : std::runtime_error |
| 4996 | + { |
| 4997 | + using std::runtime_error::runtime_error; |
| 4998 | + }; |
| 4999 | + struct allocation_test_failed : std::runtime_error |
| 5000 | + { |

Uploaded package to this PPA: https:/ /launchpad. net/~b0b0a/ +archive/ ubuntu/ hipfft- 2140314
(-proposed and amd64, amd64v3, arm64 archs enabled)
Will trigger autopkgtest when it is built and published