Merge ~bullwinkle-team/ubuntu/+source/hipfft:bullwinkle/llvm-21/ubuntu/devel into ubuntu/+source/hipfft:ubuntu/devel

Proposed by Bojan Aleksovski
Status: Merged
Approved by: Andreas Hasenack
Approved revision: 9230b376984059cb3b468ca7c94659fcda3a16ca
Merged at revision: 9230b376984059cb3b468ca7c94659fcda3a16ca
Proposed branch: ~bullwinkle-team/ubuntu/+source/hipfft:bullwinkle/llvm-21/ubuntu/devel
Merge into: ubuntu/+source/hipfft:ubuntu/devel
Diff against target: 12215 lines (+8585/-849)
64 files modified
.github/CODEOWNERS (+2/-2)
CHANGELOG.md (+21/-2)
CMakeLists.txt (+10/-29)
LICENSE.md (+3/-1)
README.md (+23/-22)
clients/CMakeLists.txt (+2/-2)
clients/bench/CMakeLists.txt (+0/-16)
clients/bench/bench.cpp (+8/-0)
clients/hipfft_params.h (+551/-64)
clients/hipfftw_helper.h (+1238/-0)
clients/samples/CMakeLists.txt (+11/-4)
clients/samples/hipfft_callback.cpp (+8/-0)
clients/tests/CMakeLists.txt (+113/-22)
clients/tests/accuracy_test_1D.cpp (+45/-35)
clients/tests/accuracy_test_2D.cpp (+21/-0)
clients/tests/accuracy_test_3D.cpp (+22/-0)
clients/tests/accuracy_test_callback.cpp (+21/-16)
clients/tests/gtest_main.cpp (+164/-143)
clients/tests/hipfft_accuracy_test.cpp (+292/-65)
clients/tests/hipfft_mpi_worker.cpp (+3/-0)
clients/tests/hipfftw_test.cpp (+2214/-0)
clients/tests/multi_device_test.cpp (+29/-11)
clients/tests/multi_stream_test.cpp (+875/-0)
clients/tests/simple_test.cpp (+114/-9)
cmake/dependencies.cmake (+16/-25)
debian/bin/bug/libhipfftw0 (+9/-0)
debian/changelog (+39/-0)
debian/control (+29/-12)
debian/libhipfft-dev.install (+1/-0)
debian/libhipfftw0.install (+2/-0)
debian/libhipfftw0.lintian-overrides (+2/-0)
debian/libhipfftw0.symbols.amd64 (+47/-0)
debian/patches/add-so-version-for-libhipfftw0.patch (+50/-0)
debian/patches/bump-rocm-docs-core-in-docs-sphinx.patch (+65/-0)
debian/patches/fix-doxygen-refs.patch (+37/-32)
debian/patches/series (+3/-3)
debian/rules (+16/-8)
debian/tests/control (+1/-1)
dev/null (+0/-22)
docs/doxygen/Doxyfile (+1/-1)
docs/index.rst (+6/-2)
docs/install/building-installing-hipfft.rst (+12/-30)
docs/reference/fft-api-usage.rst (+20/-6)
docs/sphinx/_toc.yml.in (+1/-1)
docs/sphinx/requirements.in (+1/-1)
docs/sphinx/requirements.txt (+135/-4)
library/CMakeLists.txt (+52/-46)
library/include/hipfft/hipfft.h (+39/-31)
library/include/hipfft/hipfftw.h (+142/-0)
library/src/CMakeLists.txt (+4/-1)
library/src/amd_detail/hipfft.cpp (+22/-44)
library/src/amd_detail/hipfftw.cpp (+1627/-0)
library/src/nvidia_detail/hipfft.cpp (+2/-0)
rtest.xml (+3/-0)
shared/accuracy_test.h (+23/-9)
shared/environment.h (+2/-2)
shared/fft_params.h (+163/-22)
shared/gpubuf.h (+14/-10)
shared/hostbuf.h (+62/-34)
shared/params_gen.h (+46/-17)
shared/rocfft_params.h (+14/-10)
shared/sys_mem.h (+81/-29)
shared/test_params.h (+2/-0)
toolchain-windows.cmake (+4/-3)
Reviewer Review Type Date Requested Status
Andreas Hasenack Approve
Ubuntu Sponsors Pending
Review via email: mp+499795@code.launchpad.net

Description of the change

Update to new upstream version 7.1.0

To post a comment you must log in.
Revision history for this message
Bojan Aleksovski (b0b0a) wrote :

Uploaded package to this PPA: https://launchpad.net/~b0b0a/+archive/ubuntu/hipfft-2140314

(-proposed and amd64, amd64v3, arm64 archs enabled)

Will trigger autopkgtest when it is built and published

Revision history for this message
Bojan Aleksovski (b0b0a) wrote :

`reverse-depends --arch ppc64el src:hipfft -x` returns No reverse dependencies found.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

--- a/debian/control
+++ b/debian/control
+Package: libhipfftw0
+Section: libs
+Architecture: amd64 arm64
+XB-X-ROCm-GPU-Architecture: ${rocm:GPU-Architecture}
+Depends: ${misc:Depends}, ${shlibs:Depends},
+Description: portable interface for Fast Fourier Transforms on the GPU - library
+ hipFFTW is a GPU-aware library for fast Fourier transforms using rocFFT as the
+ backend. It exports an interface borrowing the most commonly-used symbols of
+ FFTW. hipFFTW does not require its computational input and output to be
+ directly accessible by the GPU.
+ .
+ This package provides the AMD ROCm hipFFTW library.

Do you know if debian is going to add this same binary package, and also have libhipfft-dev pull in both libhipfft0 and this new libhipfftw0, like below?

 Package: libhipfft-dev
 Section: libdevel
-Architecture: amd64 arm64 ppc64el
-Depends: libhipfft0 (= ${binary:Version}),${misc:Depends}, ${shlibs:Depends},
- libamdhip64-dev,
+Architecture: amd64 arm64
+Depends: libhipfft0 (= ${binary:Version}),libhipfftw0 (= ${binary:Version}),
+ ${misc:Depends}, ${shlibs:Depends}, libamdhip64-dev,

Dealing with differences in packaging is a very annoying delta do carry in Ubuntu specially if Debian decides on a different split.

review: Needs Information
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

--- /dev/null
+++ b/debian/patches/add-so-version-for-libhipfftw0.patch
@@ -0,0 +1,45 @@
+From: Bojan Aleksovski <email address hidden>
+Date: Wed, 23 Jan 2026 16:05:35 +0100
+Subject: Add versioning for hipfftw.so
+
+Summary: This patch is fully based on upstream commit
+dfa828c45abcbeb804876b973314cf39d2b3b2e8 and only has tiny
+additional fixes in the checking that hipfftw is not an alias
+
+Origin: upstream, https://github.com/ROCm/rocm-libraries/commit/dfa828c45abcbeb804876b973314cf39d2b3b2e8.patch
+Bug: https://github.com/ROCm/rocm-libraries/issues/2400
+Forwarded: not-needed

I see the difference:
-+ if (aliased_hipfftw_name STREQUAL "")
++ if (NOT aliased_hipfftw_name)

Could you please elaborate in the patch header why this was necessary? It's better than just to say "tiny change" and leave us guessing where it was and why :)

And why forwarding to upstream is not needed? Maybe it will become clear after the reasoning, but not yet.

review: Needs Information
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

--- a/debian/patches/doxygen-refs.patch
+++ b/debian/patches/doxygen-refs.patch
@@ -1,28 +1,19 @@
-From: Kari Pahula <email address hidden>
-Date: Fri, 28 Jun 2024 16:07:05 +0300
-Subject: doxygen-refs
+From: Bojan Aleksovski <email address hidden>
+Date: Wed, 20 Jan 2026 20:55:15 +0100
+Subject: fix doxygen refs
...

What's the nature of these changes? This diff looks like the whole patch is being rewritten, including authorship. If it's indeed necessary, I would prefer if the original patch is dropped, and a new one is added with a different name.

Alternatively, to keep the existing name and authorship, you could add something about the change below the description, and add the Last-Updated DEP3 header with an updated date indicating when this change was made. But at a glance it looks like the first option (delete old, add new with new name) would be clearer?

And is this submittable to upstream?

review: Needs Information
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

--- a/debian/rules
+++ b/debian/rules
@@ -17,8 +24,7 @@ CMAKE_FLAGS = \

 # The hipfft test client requires the rocfft sources to build.
 ifeq (,$(filter nocheck,$(DEB_BUILD_PROFILES)))
-CMAKE_FLAGS += -DBUILD_CLIENTS_TESTS=ON \
- -DUSE_HIPRAND=OFF
+CMAKE_FLAGS += -DBUILD_CLIENTS_TESTS=ON
 endif

 %:

I don't think this change is described in d/changelog, and why it's needed (remove USE_HIPRAND=OFF).

review: Needs Fixing
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

d/changelog:
  * debian: new binary package libhipfftw0

Nit: that should be "* d/control: new binary package libhipfftw0"

While at it, you should probably also add a line about adding libhipfftw0 to the list of dependencies of libhipfft-dev.

Also in d/changelog:
  * d/control: fix hipfft-doc build

What in d/control fixed the hipfft-doc build? I think you meant d/rules?

review: Needs Fixing
Revision history for this message
Bojan Aleksovski (b0b0a) wrote :

Hello Andreas,

Regarding how the packages are organized, we took this direction after discussing with Cory in order to not have diff with Debian.

libhipfftw0 will be introduced as a new binary package.

Regarding the dev package, initially on the table was having separate hipfft and hipfftw dev packages but as they shared one header file we needed to introduce common-dev package that both would depend on. This option seemed far-fetched and it was agreed to leave the dev package as it is.

Regarding the tests package, on the table was renaming it to libhipfft-tests (dropping the 0 as the tests are for both bin pkgs), but as libhipfftw is basically a shim this seemed as not needed complexity as well.
Btw, this made me realize that I left out libhipfftw0 as a dependency of libhipfft0-tests, will fix that.

Revision history for this message
Bojan Aleksovski (b0b0a) wrote :

While I was writing the reasoning about how the packages are organized, I saw that you left the other notes. Thank you, I will provide reasoning and fixes soon. Have a great weekend!

Revision history for this message
Bojan Aleksovski (b0b0a) wrote :

Hello Andreas,

Please find the effort to have the comments fixed:
- Regarding add-so-version-for-libhipfftw0.patch, the "tiny change" to the fix has been elaborated in patch header and has been forwarded upstream
- Regarding doxygen-refs.patch, it has been dropped. A new fix-doxygen-refs.patch has been added which has been forwarded upstream
- Regarding the change that does `remove USE_HIPRAND=OFF`, it has been forgotten in the previous changelog entry 7.1.0-0ubuntu1. I've added it. That change refers to commit https://git.launchpad.net/~bullwinkle-team/ubuntu/+source/hipfft/commit/?h=bullwinkle/llvm-21/ubuntu/devel&id=88ca75c121a36f50d2e4e686a45058ff3548cab2
- Regarding the changelog entry for the new binary package libhipfftw0, it has been updated and a new line about having libhipfftw0 as a dependency for libhipfft-dev and libhipfft0-tests has been added
- The typo in the changelog about fix hipfft-doc build has been resolved

The easiest place to view the mentioned fixes is https://launchpadlibrarian.net/846723009/hipfft_7.1.0-0ubuntu2~ppa1_7.1.0-0ubuntu2~ppa2.diff.gz

Thank you!

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I noticed there is no symbols file for arm64, just amd64, but that is not a bug being introduced in this PR. At some point it should be fixed, though.

+1

review: Approve
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Sponsored:

Uploading hipfft_7.1.0-0ubuntu2.dsc
Uploading hipfft_7.1.0.orig.tar.gz
Uploading hipfft_7.1.0-0ubuntu2.debian.tar.xz
Uploading hipfft_7.1.0-0ubuntu2_source.buildinfo
Uploading hipfft_7.1.0-0ubuntu2_source.changes

This has a binary NEW package, so it will require an archive admin review before it lands in proposed.

Revision history for this message
Bojan Aleksovski (b0b0a) wrote :

Thank you Andreas. Bug has been created regarding the arm64 symbols (using rocblas bug as reference), so it gets fixed soon: https://bugs.launchpad.net/ubuntu/+source/hipfft/+bug/2141347

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
2old mode 100755
3new mode 100644
4index eff670f..056a93f
5--- a/.github/CODEOWNERS
6+++ b/.github/CODEOWNERS
7@@ -1,8 +1,8 @@
8-* @af-ayala @eng-flavio-teixeira @evetsso @feizheng10 @malcolmroberts
9+* @af-ayala @eng-flavio-teixeira @evetsso @malcolmroberts @regan-amd
10 # Documentation files
11 docs/ @ROCm/rocm-documentation
12 *.md @ROCm/rocm-documentation
13 *.rst @ROCm/rocm-documentation
14 .readthedocs.yaml @ROCm/rocm-documentation
15 # Header directory for Doxygen documentation
16-library/include/ @ROCm/rocm-documentation @af-ayala @eng-flavio-teixeira @evetsso @feizheng10 @malcolmroberts
17+library/include/ @ROCm/rocm-documentation @af-ayala @eng-flavio-teixeira @evetsso @malcolmroberts @regan-amd
18diff --git a/.jenkins/common.groovy b/.jenkins/common.groovy
19deleted file mode 100644
20index e0aea53..0000000
21--- a/.jenkins/common.groovy
22+++ /dev/null
23@@ -1,92 +0,0 @@
24-import static groovy.io.FileType.FILES
25-
26-def runCompileCommand(platform, project, jobName, boolean sameOrg = false)
27-{
28- project.paths.construct_build_prefix()
29-
30- def getDependenciesCommand = ""
31- if (project.installLibraryDependenciesFromCI)
32- {
33- project.libraryDependencies.each
34- { libraryName ->
35- getDependenciesCommand += auxiliary.getLibrary(libraryName, platform.jenkinsLabel, null, sameOrg)
36- }
37- }
38-
39- String cmake = platform.jenkinsLabel.contains('centos') ? "cmake3" : "cmake"
40- String warningArgs = platform.jenkinsLabel.contains('cuda') ? '':'-DWERROR=ON'
41- String hipClang = platform.jenkinsLabel.contains('hipClang') ? "HIP_COMPILER=clang" : ""
42- String path = platform.jenkinsLabel.contains('centos7') ? "export PATH=/opt/rh/devtoolset-7/root/usr/bin:$PATH" : ":"
43- String dir = jobName.contains('Debug') ? "debug" : "release"
44-
45- // hipcc with CUDA backend needs HIP_PLATFORM set accordingly in the environment
46- String hipPlatformCommand = platform.jenkinsLabel.contains("cuda") ? "export HIP_PLATFORM=nvidia" : ""
47-
48- def command = """#!/usr/bin/env bash
49- set -x
50-
51- ls /fftw/lib
52- export FFTW_ROOT=/fftw
53- export FFTW_INCLUDE_PATH=\${FFTW_ROOT}/include
54- export FFTW_LIB_PATH=\${FFTW_ROOT}/lib
55- export LD_LIBRARY_PATH=\${FFTW_LIB_PATH}:/opt/rocm/lib:/opt/rocm/hip/lib
56- export CPLUS_INCLUDE_PATH=\${FFTW_INCLUDE_PATH}:\${CPLUS_INCLUDE_PATH}
57- export CMAKE_PREFIX_PATH=\${FFTW_LIB_PATH}/cmake/fftw3:\${CMAKE_PREFIX_PATH}
58- export CMAKE_PREFIX_PATH=\${FFTW_LIB_PATH}/cmake/fftw3f:\${CMAKE_PREFIX_PATH}
59- # default container flags cause problems for CUDA backend, and aren't useful for ROCm
60- unset HIPCC_COMPILE_FLAGS_APPEND
61- unset HIPCC_LINK_FLAGS_APPEND
62- ${hipPlatformCommand}
63-
64- cd ${project.paths.project_build_prefix}
65- mkdir -p build/${dir} && cd build/${dir}
66- ${getDependenciesCommand}
67- ${path}
68- ${hipClang} ${cmake} ${warningArgs} ${project.paths.build_command}
69- make -j\$(nproc)
70- """
71-
72- platform.runCommand(this, command)
73-}
74-
75-def runTestCommand (platform, project, gfilter)
76-{
77- String cudaArgs = platform.jenkinsLabel.contains('cuda') ? '--double_epsilon=5e-11'
78- : '--precompile=rocfft-test-precompile.db'
79-
80- def command = """#!/usr/bin/env bash
81- set -x
82- cd ${project.paths.project_build_prefix}/build/release/clients/staging
83- GTEST_LISTENER=NO_PASS_LINE_IN_LOG ./hipfft-test ${cudaArgs} --gtest_output=xml --gtest_color=yes --gtest_filter=${gfilter}
84- """
85-
86- platform.runCommand(this, command)
87- //junit "${project.paths.project_build_prefix}/build/release/clients/staging/*.xml"
88-}
89-
90-def runPackageCommand(platform, project, jobName, label='')
91-{
92- def command
93-
94- label = label != '' ? '-' + label.toLowerCase() : ''
95- String ext = platform.jenkinsLabel.contains('ubuntu') ? "deb" : "rpm"
96- String dir = jobName.contains('Debug') ? "debug" : "release"
97-
98- command = """
99- set -x
100- cd ${project.paths.project_build_prefix}/build/${dir}
101- make package
102- mkdir -p package
103- for f in hipfft*.$ext
104- do
105- mv "\$f" "hipfft${label}-\${f#*-}"
106- done
107- mv *.${ext} package/
108- """
109-
110- platform.runCommand(this, command)
111- platform.archiveArtifacts(this, """${project.paths.project_build_prefix}/build/${dir}/package/*.${ext}""")
112-}
113-
114-
115-return this
116diff --git a/.jenkins/debug.groovy b/.jenkins/debug.groovy
117deleted file mode 100644
118index abac808..0000000
119--- a/.jenkins/debug.groovy
120+++ /dev/null
121@@ -1,83 +0,0 @@
122-#!/usr/bin/env groovy
123-@Library('rocJenkins@pong') _
124-
125-import com.amd.project.*
126-import com.amd.docker.*
127-import java.nio.file.Path
128-
129-def runCI =
130-{
131- nodeDetails, jobName, buildCommand ->
132-
133- def prj = new rocProject('hipFFT', 'Debug')
134- // customize for project
135- prj.paths.build_command = buildCommand
136- prj.libraryDependencies = ['rocRAND', 'rocFFT', 'hipRAND']
137-
138- // Define test architectures, optional rocm version argument is available
139- def nodes = new dockerNodes(nodeDetails, jobName, prj)
140-
141- def commonGroovy
142-
143- boolean formatCheck = false
144-
145- def compileCommand =
146- {
147- platform, project->
148-
149- project.paths.construct_build_prefix()
150-
151- commonGroovy = load "${project.paths.project_src_prefix}/.jenkins/common.groovy"
152- commonGroovy.runCompileCommand(platform, project, jobName)
153- }
154-
155- buildProject(prj, formatCheck, nodes.dockerArray, compileCommand, null, null)
156-}
157-
158-def setupCI(urlJobName, jobNameList, buildCommand, runCI, label)
159-{
160- jobNameList = auxiliary.appendJobNameList(jobNameList)
161-
162- jobNameList.each
163- {
164- jobName, nodeDetails->
165- if (urlJobName == jobName)
166- stage(label + ' ' + jobName) {
167- runCI(nodeDetails, jobName, buildCommand, label)
168- }
169- }
170-
171- // For url job names that are not listed by the jobNameList i.e. compute-rocm-dkms-no-npi-1901
172- if(!jobNameList.keySet().contains(urlJobName))
173- {
174- properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * *')])]))
175- stage(label + ' ' + urlJobName) {
176- runCI([ubuntu18:['gfx906']], urlJobName, buildCommand, label)
177- }
178- }
179-}
180-
181-ci: {
182- String urlJobName = auxiliary.getTopJobName(env.BUILD_URL)
183-
184- def propertyList = ["compute-rocm-dkms-no-npi":[pipelineTriggers([cron('0 1 * * 0')])]]
185-
186- propertyList = auxiliary.appendPropertyList(propertyList)
187-
188- propertyList.each
189- {
190- jobName, property->
191- if (urlJobName == jobName)
192- properties(auxiliary.addCommonProperties(property))
193- }
194-
195- def hostJobNameList = ["compute-rocm-dkms-no-npi":([ubuntu18:['gfx900'],centos7:['gfx906'],sles15sp1:['gfx906']])]
196-
197- def hipClangJobNameList = ["compute-rocm-dkms-no-npi":([ubuntu18:['gfx900'],centos7:['gfx906'],sles15sp1:['gfx906']])]
198-
199- String hostBuildCommand = '-DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Debug -L ../..'
200- String hipClangBuildCommand = '-DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ -DCMAKE_BUILD_TYPE=Debug -DBUILD_CLIENTS_TESTS=ON -DBUILD_CLIENTS_SAMPLES=ON -DBUILD_CLIENTS_SAMPLES=ON -L ../..'
201-
202- setupCI(urlJobName, hostJobNameList, hostBuildCommand, runCI, 'g++')
203- setupCI(urlJobName, hipClangJobNameList, hipClangBuildCommand, runCI, 'hip-clang')
204-}
205diff --git a/.jenkins/multigpu.groovy b/.jenkins/multigpu.groovy
206deleted file mode 100644
207index 9c315df..0000000
208--- a/.jenkins/multigpu.groovy
209+++ /dev/null
210@@ -1,96 +0,0 @@
211-#!/usr/bin/env groovy
212-@Library('rocJenkins@pong') _
213-
214-import com.amd.project.*
215-import com.amd.docker.*
216-import java.nio.file.Path
217-
218-def runCI =
219-{
220- nodeDetails, jobName, buildCommand, label, runTest ->
221-
222- def prj = new rocProject('hipFFT', 'multigpu')
223- // customize for project
224- prj.paths.build_command = buildCommand
225- prj.libraryDependencies = ['rocRAND', 'rocFFT', 'hipRAND']
226- prj.timeout.test = 360
227-
228- // Define test architectures, optional rocm version argument is available
229- def nodes = new dockerNodes(nodeDetails, jobName, prj)
230-
231- def commonGroovy
232-
233- boolean formatCheck = false
234-
235- def compileCommand =
236- {
237- platform, project->
238-
239- project.paths.construct_build_prefix()
240-
241- commonGroovy = load "${project.paths.project_src_prefix}/.jenkins/common.groovy"
242- commonGroovy.runCompileCommand(platform, project,jobName)
243- }
244-
245- def testCommand =
246- {
247- platform, project->
248-
249- def gfilter = "*multi_gpu*"
250- commonGroovy.runTestCommand(platform, project, gfilter)
251- }
252-
253- def packageCommand =
254- {
255- platform, project->
256-
257- commonGroovy.runPackageCommand(platform, project, jobName, label)
258- }
259-
260- buildProject(prj, formatCheck, nodes.dockerArray, compileCommand, runTest ? testCommand : null, packageCommand)
261-}
262-
263-def setupCI(urlJobName, jobNameList, buildCommand, runCI, label, runTest)
264-{
265- jobNameList = auxiliary.appendJobNameList(jobNameList)
266-
267- jobNameList.each
268- {
269- jobName, nodeDetails->
270- if (urlJobName == jobName)
271- stage(label + ' ' + jobName) {
272- runCI(nodeDetails, jobName, buildCommand, label, runTest)
273- }
274- }
275-
276- // For url job names that are not listed by the jobNameList i.e. compute-rocm-dkms-no-npi-1901
277- if(!jobNameList.keySet().contains(urlJobName))
278- {
279- properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * *')])]))
280- stage(label + ' ' + urlJobName) {
281- runCI([ubuntu18:['gfx906']], urlJobName, buildCommand, label)
282- }
283- }
284-}
285-
286-ci: {
287- String urlJobName = auxiliary.getTopJobName(env.BUILD_URL)
288-
289- def propertyList = ["main":[pipelineTriggers([cron('0 6 * * 0')])]]
290-
291- propertyList = auxiliary.appendPropertyList(propertyList)
292-
293- def jobNameList = ["main":([ubuntu20:['8gfx90a']])]
294- jobNameList = auxiliary.appendJobNameList(jobNameList)
295-
296- propertyList.each
297- {
298- jobName, property->
299- if (urlJobName == jobName)
300- properties(auxiliary.addCommonProperties(property))
301- }
302-
303- String hipClangBuildCommand = '-DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CLIENTS_TESTS=ON -DBUILD_CLIENTS_SAMPLES=ON -L ../..'
304-
305- setupCI(urlJobName, jobNameList, hipClangBuildCommand, runCI, 'hip-clang', true)
306-}
307diff --git a/.jenkins/staticanalysis.groovy b/.jenkins/staticanalysis.groovy
308deleted file mode 100644
309index dd54dde..0000000
310--- a/.jenkins/staticanalysis.groovy
311+++ /dev/null
312@@ -1,32 +0,0 @@
313-#!/usr/bin/env groovy
314-@Library('rocJenkins@pong') _
315-
316-import com.amd.project.*
317-import com.amd.docker.*
318-import java.nio.file.Path
319-
320-def runCI =
321-{
322- nodeDetails, jobName ->
323-
324- def prj = new rocProject('hipFFT-internal', 'PreCheckin')
325- // customize for project
326- prj.libraryDependencies = ['rocRAND','rocFFT']
327-
328- // Define test architectures, optional rocm version argument is available
329- def nodes = new dockerNodes(nodeDetails, jobName, prj)
330-
331- boolean formatCheck = true
332- boolean staticAnalysis = true
333-
334- buildProject(prj, formatCheck, nodes.dockerArray, null, null, null, staticAnalysis)
335-}
336-
337-ci: {
338- String urlJobName = auxiliary.getTopJobName(env.BUILD_URL)
339-
340- properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * 7')])]))
341- stage(urlJobName) {
342- runCI([ubuntu22:['any']], urlJobName)
343- }
344-}
345diff --git a/.jenkins/staticlibrary.groovy b/.jenkins/staticlibrary.groovy
346deleted file mode 100644
347index 1e99ac3..0000000
348--- a/.jenkins/staticlibrary.groovy
349+++ /dev/null
350@@ -1,98 +0,0 @@
351-#!/usr/bin/env groovy
352-@Library('rocJenkins@pong') _
353-
354-import com.amd.project.*
355-import com.amd.docker.*
356-import java.nio.file.Path
357-
358-def runCI =
359-{
360- nodeDetails, jobName, buildCommand, label ->
361-
362- def prj = new rocProject('hipFFT-internal', 'StaticLibrary')
363- // customize for project
364- prj.paths.build_command = buildCommand
365- prj.libraryDependencies = ['rocRAND','rocFFT']
366-
367- // Define test architectures, optional rocm version argument is available
368- def nodes = new dockerNodes(nodeDetails, jobName, prj)
369-
370- def commonGroovy
371-
372- boolean formatCheck = false
373-
374- def compileCommand =
375- {
376- platform, project->
377-
378- project.paths.construct_build_prefix()
379-
380- commonGroovy = load "${project.paths.project_src_prefix}/.jenkins/common.groovy"
381- commonGroovy.runCompileCommand(platform, project, jobName, true)
382- }
383-
384- def testCommand =
385- {
386- platform, project->
387-
388- def gfilter = "*"
389- commonGroovy.runTestCommand(platform, project, gfilter)
390- }
391-
392- def packageCommand =
393- {
394- platform, project->
395-
396- commonGroovy.runPackageCommand(platform, project, jobName, label)
397- }
398-
399- buildProject(prj, formatCheck, nodes.dockerArray, compileCommand, testCommand, packageCommand)
400-}
401-
402-def setupCI(urlJobName, jobNameList, buildCommand, runCI, label)
403-{
404- jobNameList = auxiliary.appendJobNameList(jobNameList)
405-
406- jobNameList.each
407- {
408- jobName, nodeDetails->
409- if (urlJobName == jobName)
410- stage(label + ' ' + jobName) {
411- runCI(nodeDetails, jobName, buildCommand, label)
412- }
413- }
414-
415- // For url job names that are not listed by the jobNameList i.e. compute-rocm-dkms-no-npi-1901
416- if(!jobNameList.keySet().contains(urlJobName))
417- {
418- properties(auxiliary.addCommonProperties([pipelineTriggers([cron('0 1 * * *')])]))
419- stage(label + ' ' + urlJobName) {
420- runCI([ubuntu16:['gfx906']], urlJobName, buildCommand, label)
421- }
422- }
423-}
424-
425-ci: {
426- String urlJobName = auxiliary.getTopJobName(env.BUILD_URL)
427-
428- def propertyList = ["compute-rocm-dkms-no-npi":[pipelineTriggers([cron('0 1 * * 0')])]]
429-
430- propertyList = auxiliary.appendPropertyList(propertyList)
431-
432- propertyList.each
433- {
434- jobName, property->
435- if (urlJobName == jobName)
436- properties(auxiliary.addCommonProperties(property))
437- }
438-
439- def hostJobNameList = ["compute-rocm-dkms-no-npi-hipclang":([ubuntu18:['gfx900']])]
440-
441- def hipClangJobNameList = ["compute-rocm-dkms-no-npi-hipclang":([ubuntu18:['gfx900']])]
442-
443- String hostBuildCommand = '-DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_SHARED_LIBS=OFF -L ../..'
444- String hipClangBuildCommand = '-DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CLIENTS_TESTS=ON -DBUILD_CLIENTS_SAMPLES=ON -DBUILD_SHARED_LIBS=OFF -L ../..'
445-
446- setupCI(urlJobName, hostJobNameList, hostBuildCommand, runCI, 'g++')
447- setupCI(urlJobName, hipClangJobNameList, hipClangBuildCommand, runCI, 'hip-clang')
448-}
449diff --git a/CHANGELOG.md b/CHANGELOG.md
450index d9c3dd8..257c7a5 100644
451--- a/CHANGELOG.md
452+++ b/CHANGELOG.md
453@@ -3,6 +3,26 @@
454 Documentation for hipFFT is available at
455 [https://rocm.docs.amd.com/projects/hipFFT/en/latest/](https://rocm.docs.amd.com/projects/hipFFT/en/latest/).
456
457+## hipFFT 1.0.21 for ROCm 7.1.0
458+
459+### Added
460+
461+* Improved test coverage of multi-stream plans.
462+* Improved test coverage of user-specified work areas.
463+* Improved test coverage of default stride calculation.
464+* **[Experimental]** Introduced the hipFFTW library, interfacing rocFFT on AMD platforms using the same symbols as FFTW3 (with partial support).
465+
466+## hipFFT 1.0.20 for ROCm 7.0.0
467+
468+### Added
469+
470+* Added gfx950 support.
471+
472+### Removed
473+
474+* Removed hipfft-rider legacy compatibility from clients
475+* Remove support for the gfx940 and gfx941 targets from the client programs.
476+
477 ## hipFFT 1.0.18 for ROCm 6.4.0
478
479 ### Added
480@@ -10,8 +30,7 @@ Documentation for hipFFT is available at
481 * Implemented the `hipfftMpAttachComm`, `hipfftXtSetDistribution`, and `hipfftXtSetSubformatDefault` APIs to allow
482 computing FFTs that are distributed between multiple MPI (Message Passing Interface) processes. These APIs can be enabled
483 with the `HIPFFT_MPI_ENABLE` CMake option, which defaults to `OFF`.
484-
485- The backend FFT library called by hipFFT must support MPI for these APIs to work.
486+* The backend FFT library called by hipFFT must support MPI for these APIs to work.
487
488 ### Changed
489
490diff --git a/CMakeLists.txt b/CMakeLists.txt
491index 7f81897..8f802ac 100644
492--- a/CMakeLists.txt
493+++ b/CMakeLists.txt
494@@ -1,5 +1,5 @@
495 # #############################################################################
496-# Copyright (C) 2020 - 2022 Advanced Micro Devices, Inc. All rights reserved.
497+# Copyright (C) 2020 - 2025 Advanced Micro Devices, Inc. All rights reserved.
498 #
499 # Permission is hereby granted, free of charge, to any person obtaining a copy
500 # of this software and associated documentation files (the "Software"), to deal
501@@ -37,7 +37,6 @@ else()
502 "Install path prefix, prepended onto install directories" )
503 endif()
504
505-
506 # Workarounds..
507 list( APPEND CMAKE_PREFIX_PATH /opt/rocm/llvm /opt/rocm )
508 list( APPEND CMAKE_MODULE_PATH ${ROCM_PATH}/lib/cmake/hip /opt/rocm/lib/cmake/hip /opt/rocm/hip/cmake )
509@@ -83,9 +82,8 @@ set(DEFAULT_GPUS
510 gfx906
511 gfx908
512 gfx90a
513- gfx940
514- gfx941
515 gfx942
516+ gfx950
517 gfx1030
518 gfx1100
519 gfx1101
520@@ -101,8 +99,6 @@ if(BUILD_ADDRESS_SANITIZER)
521 SET(DEFAULT_GPUS
522 gfx908:xnack+
523 gfx90a:xnack+
524- gfx940:xnack+
525- gfx941:xnack+
526 gfx942:xnack+)
527 add_link_options(-fuse-ld=lld)
528 add_compile_definitions(ADDRESS_SANITIZER)
529@@ -167,14 +163,8 @@ if (BUILD_WITH_COMPILER STREQUAL "HIP-NVCC" )
530 set( WARNING_FLAGS ${NVCC_WARNING_FLAGS} )
531
532 else()
533- # Define GPU targets
534- if(AMDGPU_TARGETS AND NOT GPU_TARGETS)
535- message( DEPRECATION "AMDGPU_TARGETS use is deprecated. Use GPU_TARGETS." )
536- endif()
537- set(AMDGPU_TARGETS "${DEFAULT_GPUS}" CACHE STRING "Target default GPUs if AMDGPU_TARGETS is not defined. (Deprecated, prefer GPU_TARGETS)")
538- rocm_check_target_ids(AMDGPU_TARGETS TARGETS "${AMDGPU_TARGETS}")
539- # Don't force, users should be able to override GPU_TARGETS at the command line if desired
540- set(GPU_TARGETS "${AMDGPU_TARGETS}" CACHE STRING "GPU architectures to build for")
541+ # Check support for the GPU target(s)
542+ rocm_check_target_ids(GPU_TARGETS TARGETS "${GPU_TARGETS}")
543 if( BUILD_WITH_COMPILER STREQUAL "HIP-CLANG" )
544 set( HIP_PLATFORM "amd" )
545 set( HIP_COMPILER "clang" )
546@@ -184,22 +174,11 @@ endif()
547 # Show the actual compiler(internal option)
548 message(STATUS "BUILD_WITH_COMPILER = " ${BUILD_WITH_COMPILER})
549
550-# FOR HANDLING ENABLE/DISABLE OPTIONAL BACKWARD COMPATIBILITY for FILE/FOLDER REORG
551-option(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY "Build with file/folder reorg with backward compatibility enabled" OFF)
552-if(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY AND NOT WIN32)
553- rocm_wrap_header_dir(
554- ${CMAKE_SOURCE_DIR}/library/include
555- PATTERNS "*.h"
556- GUARDS SYMLINK WRAPPER
557- WRAPPER_LOCATIONS ${CMAKE_INSTALL_INCLUDEDIR}
558- )
559-endif()
560-
561 # Version
562-set( VERSION_STRING "1.0.18" )
563+set( VERSION_STRING "1.0.21" )
564 set( hipfft_SOVERSION 0.1 )
565
566-if( ROCM_FOUND )
567+if( ROCmCMakeBuildTools_FOUND )
568 rocm_setup_version( VERSION ${VERSION_STRING} )
569 endif()
570
571@@ -262,7 +241,9 @@ endif()
572 if(WIN32)
573 set(CPACK_SOURCE_GENERATOR "ZIP")
574 set(CPACK_GENERATOR "ZIP")
575- set(CMAKE_INSTALL_PREFIX "C:/hipSDK" CACHE PATH "Install path" FORCE)
576+ if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
577+ set(CMAKE_INSTALL_PREFIX "C:/hipSDK" CACHE PATH "Install path" FORCE)
578+ endif()
579 set(INSTALL_PREFIX "C:/hipSDK")
580 set(CPACK_SET_DESTDIR OFF)
581 set(CPACK_PACKAGE_INSTALL_DIRECTORY "C:/hipSDK")
582@@ -270,7 +251,7 @@ if(WIN32)
583 set(CPACK_INCLUDE_TOPLEVEL_DIRECTORY OFF)
584 endif()
585
586-if( ROCM_FOUND )
587+if( ROCmCMakeBuildTools_FOUND )
588 # Package specific CPACK vars
589 if( NOT BUILD_WITH_LIB STREQUAL "CUDA" )
590 rocm_package_add_dependencies(DEPENDS "rocfft >= 1.0.21")
591diff --git a/LICENSE.md b/LICENSE.md
592index f081475..ee33c80 100644
593--- a/LICENSE.md
594+++ b/LICENSE.md
595@@ -1,6 +1,6 @@
596 MIT License
597
598-Copyright (C) 2016 - 2025 Advanced Micro Devices, Inc. All rights reserved.
599+Copyright (C) Advanced Micro Devices, Inc.
600
601 Permission is hereby granted, free of charge, to any person obtaining a copy
602 of this software and associated documentation files (the "Software"), to deal
603@@ -20,6 +20,8 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
604 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
605 SOFTWARE.
606
607+---
608+
609 This product includes software from copyright holders as shown below, and distributed under their license terms as specified.
610
611 CLI11 2.2 Copyright (c) 2017-2024 University of Cincinnati, developed by Henry
612diff --git a/README.md b/README.md
613index d47fd12..5961742 100644
614--- a/README.md
615+++ b/README.md
616@@ -1,7 +1,7 @@
617 # hipFFT
618
619 hipFFT is an FFT marshalling library that supports
620-[rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT) and
621+[rocFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocfft) and
622 [cuFFT](https://developer.nvidia.com/cufft) backends.
623
624 hipFFT exports an interface that doesn't require the client to change, regardless of the chosen backend.
625@@ -11,12 +11,12 @@ and marshals results back to your application.
626 ## Documentation
627
628 > [!NOTE]
629-> The published hipFFT documentation is available at [hipFFT](https://rocm.docs.amd.com/projects/hipFFT/en/latest/index.html) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the hipFFT/docs folder of this repository. As with all ROCm projects, the documentation is open source. For more information, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).
630+> The published hipFFT documentation is available at [hipFFT](https://rocm.docs.amd.com/projects/hipFFT/en/latest/index.html) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the projects/hipfft/docs folder of the rocm-libraries repository. As with all ROCm projects, the documentation is open source. For more information, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).
631
632 To build our documentation locally, run the following code:
633
634 ```bash
635-cd docs
636+cd projects/hipfft/docs
637
638 pip3 install -r sphinx/requirements.txt
639
640@@ -36,14 +36,13 @@ To build hipFFT from source, follow these steps:
641
642 1. Install the library build dependencies:
643
644- * On AMD platforms, you must install [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT).
645- * On NVIDIA platforms, you must install [cuFFT](https://developer.nvidia.com/cufft).
646+ * On AMD platforms, you must install [rocFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocfft).
647
648 2. Install the client build dependencies:
649
650 * The clients (samples, tests, etc) included with the hipFFT source depend on hipRAND, FFTW and GoogleTest.
651
652-3. Build hipFFT:
653+3. Build hipFFT. Run these commands from the `rocm-libraries/projects/hipfft` directory:
654
655 To show all build options:
656
657@@ -52,40 +51,42 @@ To build hipFFT from source, follow these steps:
658 cmake -LH ..
659 ```
660
661-Here are some CMake build examples:
662+Here are some CMake build examples for an AMD GPU:
663
664-* AMD GPU
665- * Case: Build a project using HIP language APIs + hipFFT with standard host compiler
666- * Code: `cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -L ..`
667- * Case: Build a project using HIP language APIs + hipFFT + device kernels with HIP-Clang
668- * Code: `cmake -DCMAKE_CXX_COMPILER=amdclang++ -DCMAKE_BUILD_TYPE=Release -DBUILD_CLIENTS=ON -L ..`
669-* NVIDIA GPU
670- * Case: Build a project using HIP language APIs + hipFFT with standard host compiler
671- * Code: `cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -DBUILD_WITH_LIB=CUDA -L ..`
672- * Case: Build a project using HIP language APIs + hipFFT + device kernels with HIP-NVCC
673- * Code: `HIP_PLATFORM=nvidia cmake -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_BUILD_TYPE=Release -DBUILD_CLIENTS=ON -L ..`
674+* Case: Build a project using HIP language APIs + hipFFT with standard host compiler
675+ * Code: `cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -L ..`
676+* Case: Build a project using HIP language APIs + hipFFT + device kernels with HIP-Clang
677+ * Code: `cmake -DCMAKE_CXX_COMPILER=amdclang++ -DCMAKE_BUILD_TYPE=Release -DBUILD_CLIENTS=ON -L ..`
678
679 ```note
680 The `-DBUILD_CLIENTS=ON` option is only allowed with the amdclang++ or HIPCC compilers.
681 ```
682
683+## Code Coverage
684+You can generate a test coverage report with the following:
685+
686+```bash
687+cmake -DCMAKE_CXX_COMPILER=amdclang++ -DBUILD_CLIENTS_SAMPLES=ON -DBUILD_CLIENTS_TESTS=ON -DBUILD_CODE_COVERAGE=ON <optional: -DCOVERAGE_TEST_OPTIONS="cmdline args to pass to hipfft-test (default: --smoketest)"> ..
688+make -j coverage
689+```
690+The commands above will output the coverage report to the terminal and save an html coverage report to `$PWD/coverage-report`. Note that hipFFT uses llvm for code coverage, which only works with clang compilers.
691+
692 ## Porting from CUDA
693
694 If you have existing CUDA code and want to transition to HIP, follow these steps:
695
696 1. [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY) your code and fix all unsupported CUDA
697 features and user-defined macros
698-2. Build with HIP-NVCC to run on an NVIDIA device
699-3. Build with HIP-Clang to run on an AMD device
700+2. Build with HIP-Clang to run on an AMD device
701
702 More information about porting to HIP is available in the
703 [HIP porting guide](https://rocm.docs.amd.com/projects/HIP/en/develop/user_guide/hip_porting_guide.html).
704
705 ## Support
706
707-You can report bugs and feature requests through the GitHub
708-[issue tracker](https://github.com/ROCm/hipFFT/issues).
709+You can report bugs and feature requests through the rocm-libraries GitHub
710+[issue tracker](https://github.com/ROCm/rocm-libraries/issues).
711
712 ## Contribute
713
714-If you want to contribute to hipFFT, you must follow our [contribution guidelines](https://github.com/ROCm/hipFFT/blob/develop/.github/CONTRIBUTING.md).
715+If you want to contribute to hipFFT, you must follow our [contribution guidelines](https://github.com/ROCm/rocm-libraries/blob/develop/projects/hipfft/.github/CONTRIBUTING.md).
716diff --git a/clients/CMakeLists.txt b/clients/CMakeLists.txt
717index 739ef03..67de6e8 100644
718--- a/clients/CMakeLists.txt
719+++ b/clients/CMakeLists.txt
720@@ -88,9 +88,9 @@ if( BUILD_CLIENTS_SAMPLES )
721 endif( )
722
723 if( BUILD_CLIENTS_TESTS )
724- find_package( GTest 1.11.0 )
725+ find_package( GTest QUIET )
726 include( ExternalProject )
727- if( NOT GTEST_FOUND )
728+ if( NOT GTest_FOUND )
729 set( GTEST_INCLUDE_DIRS
730 ${CMAKE_CURRENT_BINARY_DIR}/src/gtest/googletest/include
731 )
732diff --git a/clients/bench/CMakeLists.txt b/clients/bench/CMakeLists.txt
733index c6da335..4cece34 100644
734--- a/clients/bench/CMakeLists.txt
735+++ b/clients/bench/CMakeLists.txt
736@@ -93,19 +93,3 @@ set_target_properties( hipfft-bench
737 ${BENCH_OUT_DIR} )
738
739 rocm_install(TARGETS hipfft-bench COMPONENT benchmarks)
740-
741-# install compatibility for old name of bench program - symlink on
742-# unix, hardlink on windows (since privilege is required to create
743-# symlinks there)
744-if( WIN32 )
745- set( BENCH_LINK_COMMAND create_hardlink )
746- set( BENCH_NEW_NAME ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/hipfft-bench${CMAKE_EXECUTABLE_SUFFIX} )
747- set( BENCH_OLD_NAME ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/hipfft-rider${CMAKE_EXECUTABLE_SUFFIX} )
748-else()
749- set( BENCH_LINK_COMMAND create_symlink )
750- set( BENCH_NEW_NAME hipfft-bench )
751- set( BENCH_OLD_NAME ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/hipfft-rider )
752-endif()
753-install(
754- CODE "execute_process( COMMAND \"${CMAKE_COMMAND}\" -E ${BENCH_LINK_COMMAND} \"${BENCH_NEW_NAME}\" \"${BENCH_OLD_NAME}\" )"
755-)
756diff --git a/clients/bench/bench.cpp b/clients/bench/bench.cpp
757index c9cfed9..52cf888 100644
758--- a/clients/bench/bench.cpp
759+++ b/clients/bench/bench.cpp
760@@ -31,6 +31,9 @@
761 #include "../../shared/client_except.h"
762 #include "../../shared/gpubuf.h"
763
764+// initialize static class member of hipfft_params
765+std::vector<gpubuf> hipfft_params::externally_managed_workareas = std::vector<gpubuf>();
766+
767 int main(int argc, char* argv[])
768 {
769 // This helps with mixing output of both wide and narrow characters to the screen
770@@ -79,6 +82,11 @@ int main(int argc, char* argv[])
771 "forward\n3) real inverse")
772 ->default_val(fft_transform_type_complex_forward);
773 non_token
774+ ->add_option("--auto_allocation",
775+ params.auto_allocate,
776+ "hipFFT's auto-allocation behavior: \"on\", \"off\", or \"default\"")
777+ ->default_val("default");
778+ non_token
779 ->add_option(
780 "--precision", params.precision, "Transform precision: single (default), double, half")
781 ->excludes("--double");
782diff --git a/clients/hipfft_params.h b/clients/hipfft_params.h
783index e0fc576..f5bcdb4 100644
784--- a/clients/hipfft_params.h
785+++ b/clients/hipfft_params.h
786@@ -32,12 +32,28 @@
787 #include "../shared/hipfft_brick.h"
788 #include "hipfft/hipfft.h"
789 #include "hipfft/hipfftXt.h"
790+#include <random>
791
792 #ifdef HIPFFT_MPI_ENABLE
793 #include "hipfft/hipfftMp.h"
794 #include <mpi.h>
795 #endif
796
797+template <typename T,
798+ typename... Args,
799+ std::enable_if_t<std::is_integral_v<T> && (std::is_same_v<T, Args> && ...), bool> = true>
800+static void set_with_random_nonnegative_values(const std::string& token, T& val, Args&... args)
801+{
802+ // using a hash of the token as random seed to avoid
803+ // dependencies on externally-defined variables
804+ std::hash<std::string> hasher;
805+ std::ranlux24_base gen(hasher(token));
806+ std::uniform_int_distribution<T> dis(static_cast<T>(0), std::numeric_limits<T>::max());
807+ val = dis(gen);
808+ ((args = dis(gen)), ...);
809+ return;
810+}
811+
812 inline fft_status fft_status_from_hipfftparams(const hipfftResult_t val)
813 {
814 switch(val)
815@@ -141,6 +157,13 @@ public:
816 std::vector<long long int> ll_inembed;
817 std::vector<long long int> ll_onembed;
818
819+ template <typename T>
820+ struct many_api_layout_args
821+ {
822+ T *input_embed, *output_embed;
823+ T input_stride, output_stride, input_distance, output_distance;
824+ };
825+
826 struct hipLibXtDesc_deleter
827 {
828 void operator()(hipLibXtDesc* d)
829@@ -160,22 +183,46 @@ public:
830
831 // backend library can write N worksize values for N GPUs, so
832 // allocate a vector for that if necessary
833- std::vector<size_t> xt_worksize;
834+ std::vector<size_t> auto_allocated_worksizes;
835+ // if auto_allocate == fft_auto_allocation_off, the hipFFT plan(s)
836+ // will be provided with externally-managed work area(s):
837+ static std::vector<gpubuf> externally_managed_workareas;
838
839- // pointer we pass to the backend library. By default point to the
840- // single-GPU workbuffer size.
841- size_t* workbuffersize_ptr;
842+ size_t auto_allocated_extra_vram_footprint() const
843+ {
844+ return std::accumulate(auto_allocated_worksizes.begin(),
845+ auto_allocated_worksizes.end(),
846+ static_cast<size_t>(0));
847+ }
848+
849+ static size_t externally_managed_extra_vram_footprint()
850+ {
851+ return std::accumulate(externally_managed_workareas.begin(),
852+ externally_managed_workareas.end(),
853+ static_cast<size_t>(0),
854+ [](size_t total, const gpubuf& buf) { return total + buf.size(); });
855+ }
856
857- hipfft_params()
858+ bool is_preventing_auto_allocation_at_generation() const
859 {
860- workbuffersize_ptr = &workbuffersize;
861+ if(auto_allocate != fft_auto_allocation_off)
862+ return false;
863+ // Let hipFFT sometimes auto-allocate nonetheless so that tests cover its
864+ // ability to free resources (allocated at generation) when/if some
865+ // externally-managed workarea(s) are provided after plan generation
866+ // Note: this member function must return the same result even if called
867+ // more than once by a given instance, it must be stable for any instance
868+ return std::hash<std::string>()(token()) % 2 == 1;
869 }
870
871+ hipfft_params() = default;
872+
873 hipfft_params(const fft_params& p)
874 : fft_params(p)
875 {
876- workbuffersize_ptr = &workbuffersize;
877 }
878+ hipfft_params(hipfft_params&& p) = default;
879+ hipfft_params& operator=(hipfft_params&& other) = default;
880
881 ~hipfft_params()
882 {
883@@ -206,12 +253,14 @@ public:
884 }
885 catch(fft_params::work_buffer_alloc_failure& e)
886 {
887- val += workbuffersize;
888+ val += auto_allocated_extra_vram_footprint();
889+ val += externally_managed_extra_vram_footprint();
890 std::stringstream msg;
891 msg << "Plan work buffer size (" << val << " bytes raw data) too large for device";
892 throw ROCFFT_SKIP{msg.str()};
893 }
894- val += workbuffersize;
895+ val += auto_allocated_extra_vram_footprint();
896+ val += externally_managed_extra_vram_footprint();
897 return val;
898 }
899
900@@ -336,6 +385,11 @@ public:
901 int_inembed[i] = ll_inembed[i];
902 int_onembed[i] = ll_onembed[i];
903 }
904+ // reset auto_allocated_worksizes
905+ auto_allocated_worksizes.resize(get_num_used_gpus());
906+ std::for_each(auto_allocated_worksizes.begin(),
907+ auto_allocated_worksizes.end(),
908+ [](decltype(auto_allocated_worksizes)::value_type& val) { val = 0; });
909
910 hipfftResult ret = HIPFFT_SUCCESS;
911 return fft_status_from_hipfftparams(ret);
912@@ -402,19 +456,47 @@ public:
913 }
914 }
915
916+ if(ret == HIPFFT_SUCCESS && auto_allocate == fft_auto_allocation_off)
917+ {
918+ ret = set_externally_managed_work_areas();
919+ }
920+
921 // hipFFT can fail plan creation due to allocation failure -
922 // tests are expecting a specific exception in that case,
923 // because the test was unable to run. Doesn't mean the test
924 // case failed.
925 if(ret == HIPFFT_ALLOC_FAILED)
926- throw fft_params::work_buffer_alloc_failure(
927- "plan create failed due to allocation failure");
928+ {
929+ if(!final_attempt_at_plan_creation && externally_managed_extra_vram_footprint() > 0)
930+ {
931+ final_attempt_at_plan_creation = true;
932+ // device allocation(s) in externally_managed_workareas might be
933+ // larger than needed or even unnecessary for the instance of interest.
934+ // Free them up and try again before concluding.
935+ externally_managed_workareas.clear();
936+ return create_plan();
937+ }
938+ else
939+ {
940+ throw fft_params::work_buffer_alloc_failure(
941+ "plan create failed due to allocation failure",
942+ externally_managed_extra_vram_footprint()
943+ + auto_allocated_extra_vram_footprint());
944+ }
945+ }
946
947 // store token to check if plan was already made
948 current_token = token();
949 return fft_status_from_hipfftparams(ret);
950 }
951
952+ hipfftResult_t set_stream(hipStream_t stream)
953+ {
954+ if(plan == INVALID_PLAN_HANDLE)
955+ throw std::runtime_error("Plan must be created before setting a desired stream");
956+ return hipfftSetStream(plan, stream);
957+ }
958+
959 void validate_fields() const override
960 {
961 validate_brick_volume();
962@@ -454,10 +536,12 @@ public:
963 }
964 }
965
966- fft_status set_callbacks(void* load_cb_host,
967- void* load_cb_data,
968- void* store_cb_host,
969- void* store_cb_data) override
970+ fft_status set_callbacks(void* load_cb_host,
971+ void* load_cb_data,
972+ void* store_cb_host,
973+ void* store_cb_data,
974+ size_t load_cb_shared_mem_bytes = 0,
975+ size_t store_cb_shared_mem_bytes = 0) override
976 {
977 if(run_callbacks)
978 {
979@@ -476,6 +560,16 @@ public:
980 plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX, &store_cb_data);
981 if(ret != HIPFFT_SUCCESS)
982 return fft_status_from_hipfftparams(ret);
983+
984+ ret = hipfftXtSetCallbackSharedSize(
985+ plan, HIPFFT_CB_LD_REAL, load_cb_shared_mem_bytes);
986+ if(ret != HIPFFT_SUCCESS)
987+ return fft_status_from_hipfftparams(ret);
988+
989+ ret = hipfftXtSetCallbackSharedSize(
990+ plan, HIPFFT_CB_ST_COMPLEX, store_cb_shared_mem_bytes);
991+ if(ret != HIPFFT_SUCCESS)
992+ return fft_status_from_hipfftparams(ret);
993 break;
994 case HIPFFT_D2Z:
995 ret = hipfftXtSetCallback(
996@@ -487,6 +581,16 @@ public:
997 plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX_DOUBLE, &store_cb_data);
998 if(ret != HIPFFT_SUCCESS)
999 return fft_status_from_hipfftparams(ret);
1000+
1001+ ret = hipfftXtSetCallbackSharedSize(
1002+ plan, HIPFFT_CB_LD_REAL_DOUBLE, load_cb_shared_mem_bytes);
1003+ if(ret != HIPFFT_SUCCESS)
1004+ return fft_status_from_hipfftparams(ret);
1005+
1006+ ret = hipfftXtSetCallbackSharedSize(
1007+ plan, HIPFFT_CB_ST_COMPLEX_DOUBLE, store_cb_shared_mem_bytes);
1008+ if(ret != HIPFFT_SUCCESS)
1009+ return fft_status_from_hipfftparams(ret);
1010 break;
1011 case HIPFFT_C2R:
1012 ret = hipfftXtSetCallback(plan, &load_cb_host, HIPFFT_CB_LD_COMPLEX, &load_cb_data);
1013@@ -496,6 +600,16 @@ public:
1014 ret = hipfftXtSetCallback(plan, &store_cb_host, HIPFFT_CB_ST_REAL, &store_cb_data);
1015 if(ret != HIPFFT_SUCCESS)
1016 return fft_status_from_hipfftparams(ret);
1017+
1018+ ret = hipfftXtSetCallbackSharedSize(
1019+ plan, HIPFFT_CB_LD_COMPLEX, load_cb_shared_mem_bytes);
1020+ if(ret != HIPFFT_SUCCESS)
1021+ return fft_status_from_hipfftparams(ret);
1022+
1023+ ret = hipfftXtSetCallbackSharedSize(
1024+ plan, HIPFFT_CB_ST_REAL, store_cb_shared_mem_bytes);
1025+ if(ret != HIPFFT_SUCCESS)
1026+ return fft_status_from_hipfftparams(ret);
1027 break;
1028 case HIPFFT_Z2D:
1029 ret = hipfftXtSetCallback(
1030@@ -507,6 +621,16 @@ public:
1031 plan, &store_cb_host, HIPFFT_CB_ST_REAL_DOUBLE, &store_cb_data);
1032 if(ret != HIPFFT_SUCCESS)
1033 return fft_status_from_hipfftparams(ret);
1034+
1035+ ret = hipfftXtSetCallbackSharedSize(
1036+ plan, HIPFFT_CB_LD_COMPLEX_DOUBLE, load_cb_shared_mem_bytes);
1037+ if(ret != HIPFFT_SUCCESS)
1038+ return fft_status_from_hipfftparams(ret);
1039+
1040+ ret = hipfftXtSetCallbackSharedSize(
1041+ plan, HIPFFT_CB_ST_REAL_DOUBLE, store_cb_shared_mem_bytes);
1042+ if(ret != HIPFFT_SUCCESS)
1043+ return fft_status_from_hipfftparams(ret);
1044 break;
1045 case HIPFFT_C2C:
1046 ret = hipfftXtSetCallback(plan, &load_cb_host, HIPFFT_CB_LD_COMPLEX, &load_cb_data);
1047@@ -517,6 +641,16 @@ public:
1048 plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX, &store_cb_data);
1049 if(ret != HIPFFT_SUCCESS)
1050 return fft_status_from_hipfftparams(ret);
1051+
1052+ ret = hipfftXtSetCallbackSharedSize(
1053+ plan, HIPFFT_CB_LD_COMPLEX, load_cb_shared_mem_bytes);
1054+ if(ret != HIPFFT_SUCCESS)
1055+ return fft_status_from_hipfftparams(ret);
1056+
1057+ ret = hipfftXtSetCallbackSharedSize(
1058+ plan, HIPFFT_CB_ST_COMPLEX, store_cb_shared_mem_bytes);
1059+ if(ret != HIPFFT_SUCCESS)
1060+ return fft_status_from_hipfftparams(ret);
1061 break;
1062 case HIPFFT_Z2Z:
1063 ret = hipfftXtSetCallback(
1064@@ -528,6 +662,16 @@ public:
1065 plan, &store_cb_host, HIPFFT_CB_ST_COMPLEX_DOUBLE, &store_cb_data);
1066 if(ret != HIPFFT_SUCCESS)
1067 return fft_status_from_hipfftparams(ret);
1068+
1069+ ret = hipfftXtSetCallbackSharedSize(
1070+ plan, HIPFFT_CB_LD_COMPLEX_DOUBLE, load_cb_shared_mem_bytes);
1071+ if(ret != HIPFFT_SUCCESS)
1072+ return fft_status_from_hipfftparams(ret);
1073+
1074+ ret = hipfftXtSetCallbackSharedSize(
1075+ plan, HIPFFT_CB_ST_COMPLEX_DOUBLE, store_cb_shared_mem_bytes);
1076+ if(ret != HIPFFT_SUCCESS)
1077+ return fft_status_from_hipfftparams(ret);
1078 break;
1079 default:
1080 throw std::runtime_error("Invalid execution type");
1081@@ -921,17 +1065,301 @@ private:
1082 CREATE_XT_MAKE_PLAN_MANY,
1083 };
1084
1085+ // check that worksize estimates can be successfully queried with or without a valid plan
1086+ hipfftResult_t check_worksize_estimate()
1087+ {
1088+ hipfftResult_t ret{HIPFFT_INTERNAL_ERROR};
1089+ if(!hipfft_transform_type)
1090+ {
1091+ throw std::runtime_error("Estimating worksize requires a valid type of transform");
1092+ }
1093+ std::vector<size_t> worksize_estimate(get_num_used_gpus(), absurd_init_worksize_estimate);
1094+ switch(get_create_type())
1095+ {
1096+ case CREATE_MAKE_PLAN_Nd:
1097+ {
1098+ switch(dim())
1099+ {
1100+ case 1:
1101+ if(plan == INVALID_PLAN_HANDLE)
1102+ ret = hipfftEstimate1d(
1103+ int_length[0], *hipfft_transform_type, nbatch, worksize_estimate.data());
1104+ else
1105+ ret = hipfftGetSize1d(plan,
1106+ int_length[0],
1107+ *hipfft_transform_type,
1108+ nbatch,
1109+ worksize_estimate.data());
1110+ break;
1111+ case 2:
1112+ if(plan == INVALID_PLAN_HANDLE)
1113+ ret = hipfftEstimate2d(int_length[0],
1114+ int_length[1],
1115+ *hipfft_transform_type,
1116+ worksize_estimate.data());
1117+ else
1118+ ret = hipfftGetSize2d(plan,
1119+ int_length[0],
1120+ int_length[1],
1121+ *hipfft_transform_type,
1122+ worksize_estimate.data());
1123+ break;
1124+ case 3:
1125+ if(plan == INVALID_PLAN_HANDLE)
1126+ ret = hipfftEstimate3d(int_length[0],
1127+ int_length[1],
1128+ int_length[2],
1129+ *hipfft_transform_type,
1130+ worksize_estimate.data());
1131+ else
1132+ ret = hipfftGetSize3d(plan,
1133+ int_length[0],
1134+ int_length[1],
1135+ int_length[2],
1136+ *hipfft_transform_type,
1137+ worksize_estimate.data());
1138+ break;
1139+ default:
1140+ throw std::runtime_error("invalid dim");
1141+ }
1142+ break;
1143+ }
1144+ case CREATE_MAKE_PLAN_MANY:
1145+ {
1146+ auto layout_args = make_valid_layout_args_for_plan_many<int>();
1147+ if(plan == INVALID_PLAN_HANDLE)
1148+ ret = hipfftEstimateMany(dim(),
1149+ int_length.data(),
1150+ layout_args.input_embed,
1151+ layout_args.input_stride,
1152+ layout_args.input_distance,
1153+ layout_args.output_embed,
1154+ layout_args.output_stride,
1155+ layout_args.output_distance,
1156+ *hipfft_transform_type,
1157+ nbatch,
1158+ worksize_estimate.data());
1159+ else
1160+ ret = hipfftGetSizeMany(plan,
1161+ dim(),
1162+ int_length.data(),
1163+ layout_args.input_embed,
1164+ layout_args.input_stride,
1165+ layout_args.input_distance,
1166+ layout_args.output_embed,
1167+ layout_args.output_stride,
1168+ layout_args.output_distance,
1169+ *hipfft_transform_type,
1170+ nbatch,
1171+ worksize_estimate.data());
1172+ break;
1173+ }
1174+ case CREATE_MAKE_PLAN_MANY64:
1175+ {
1176+ if(plan == INVALID_PLAN_HANDLE)
1177+ {
1178+ // no direct equivalent in estimate-fetching APIs
1179+ std::for_each(worksize_estimate.begin(),
1180+ worksize_estimate.end(),
1181+ [](decltype(worksize_estimate)::value_type& val) { val = 0; });
1182+ ret = HIPFFT_SUCCESS;
1183+ }
1184+ else
1185+ {
1186+ auto layout_args = make_valid_layout_args_for_plan_many<long long>();
1187+ ret = hipfftGetSizeMany64(plan,
1188+ dim(),
1189+ ll_length.data(),
1190+ layout_args.input_embed,
1191+ layout_args.input_stride,
1192+ layout_args.input_distance,
1193+ layout_args.output_embed,
1194+ layout_args.output_stride,
1195+ layout_args.output_distance,
1196+ *hipfft_transform_type,
1197+ nbatch,
1198+ worksize_estimate.data());
1199+ }
1200+ break;
1201+ }
1202+ case CREATE_XT_MAKE_PLAN_MANY:
1203+ {
1204+ if(plan == INVALID_PLAN_HANDLE)
1205+ {
1206+ // no direct equivalent in estimate-fetching APIs
1207+ std::for_each(worksize_estimate.begin(),
1208+ worksize_estimate.end(),
1209+ [](decltype(worksize_estimate)::value_type& val) { val = 0; });
1210+ ret = HIPFFT_SUCCESS;
1211+ }
1212+ else
1213+ {
1214+ auto executionType = get_xt_api_execution_type();
1215+ auto layout_args = make_valid_layout_args_for_plan_many<long long>();
1216+ ret = hipfftXtGetSizeMany(plan,
1217+ dim(),
1218+ ll_length.data(),
1219+ layout_args.input_embed,
1220+ layout_args.input_stride,
1221+ layout_args.input_distance,
1222+ inputType,
1223+ layout_args.output_embed,
1224+ layout_args.output_stride,
1225+ layout_args.output_distance,
1226+ outputType,
1227+ nbatch,
1228+ worksize_estimate.data(),
1229+ executionType);
1230+ }
1231+ break;
1232+ }
1233+ case PLAN_Nd:
1234+ case PLAN_MANY:
1235+ default:
1236+ {
1237+ // should be indirectly disabled via get_create_type()
1238+ return HIPFFT_INTERNAL_ERROR;
1239+ }
1240+ }
1241+ // check that the value(s) of worksize_estimate were actually set, assuming that
1242+ // setting a worksize_estimate equal to absurd_init_worksize_estimate by hipFFT
1243+ // cannot be considered "correct".
1244+ // Note: worksize_estimate value(s) are *not* guaranteed to be greater than or equal
1245+ // to the actual value(s) of the work area(s), queriable after plan generation via
1246+ // hipfftGetSize.
1247+ if(ret == HIPFFT_SUCCESS)
1248+ {
1249+ // the estimate can't have any knowledge about the number of GPUs being used if
1250+ // the plan wasn't created first
1251+ const size_t num_values_to_check
1252+ = plan == INVALID_PLAN_HANDLE ? 1 : worksize_estimate.size();
1253+ for(size_t idx = 0; ret == HIPFFT_SUCCESS && idx < num_values_to_check; idx++)
1254+ {
1255+ ret = worksize_estimate[idx] != absurd_init_worksize_estimate
1256+ ? HIPFFT_SUCCESS
1257+ : HIPFFT_INTERNAL_ERROR;
1258+ }
1259+ }
1260+ return ret;
1261+ }
1262+
1263+ // provide a work area to a successfully generated plan
1264+ hipfftResult_t set_externally_managed_work_areas()
1265+ {
1266+ std::vector<size_t> req_workarea_sizes(get_num_used_gpus(), absurd_init_worksize_estimate);
1267+ hipfftResult_t ret = hipfftGetSize(plan, req_workarea_sizes.data());
1268+ if(ret != HIPFFT_SUCCESS)
1269+ {
1270+ return ret;
1271+ }
1272+ else if(std::any_of(req_workarea_sizes.begin(),
1273+ req_workarea_sizes.end(),
1274+ [](const decltype(req_workarea_sizes)::value_type& val) {
1275+ return val == absurd_init_worksize_estimate;
1276+ }))
1277+ {
1278+ return HIPFFT_INTERNAL_ERROR;
1279+ }
1280+ // req_workarea_sizes are known and validated
1281+ // check if the current externally_managed_workareas can be used as is or not
1282+ if(externally_managed_workareas.size() < get_num_used_gpus())
1283+ externally_managed_workareas.resize(get_num_used_gpus());
1284+ std::vector<void*> workareas(get_num_used_gpus(), nullptr);
1285+ for(auto workarea_idx = 0; workarea_idx < get_num_used_gpus(); workarea_idx++)
1286+ {
1287+ const auto req_size = req_workarea_sizes[workarea_idx];
1288+ auto& buf = externally_managed_workareas[workarea_idx];
1289+ if(buf.size() < req_size)
1290+ {
1291+ // too small, free and reallocate to meet current needs
1292+ buf.free();
1293+ if(buf.alloc(req_size) != hipSuccess)
1294+ {
1295+ return HIPFFT_ALLOC_FAILED;
1296+ }
1297+ }
1298+ workareas[workarea_idx] = buf.data();
1299+ }
1300+ if(get_num_used_gpus() > 1)
1301+ {
1302+ // TODO: enable below once hipfftXtSetWorkArea is enabled
1303+#if(0)
1304+ ret = hipfftXtSetWorkArea(plan, workareas.data);
1305+#else
1306+ throw unimplemented_exception(
1307+ "No implementation support for externally-managed work areas with multi-gpu usage");
1308+#endif
1309+ }
1310+ else
1311+ {
1312+ ret = hipfftSetWorkArea(plan, workareas[0]);
1313+ }
1314+ if(ret == HIPFFT_SUCCESS)
1315+ {
1316+ // the above "SetWorkArea" frees auto_allocated worksizes (if any)
1317+ auto_allocated_worksizes.clear();
1318+ }
1319+ return ret;
1320+ }
1321+
1322 // return true if we need to use hipFFT APIs that separate plan
1323 // allocation and plan init
1324 bool need_separate_create_make() const
1325 {
1326- // scale factor and multi-GPU need API calls between create +
1327- // init
1328- if(scale_factor != 1.0 || multiGPU > 1 || mp_lib != fft_mp_lib_none)
1329+ // scale factor and multi-GPU and disabled auto-allocation need API
1330+ // calls between create + init
1331+ if(scale_factor != 1.0 || multiGPU > 1 || mp_lib != fft_mp_lib_none
1332+ || auto_allocate == fft_auto_allocation_off)
1333 return true;
1334 return false;
1335 }
1336
1337+ template <
1338+ typename T,
1339+ std::enable_if_t<std::is_same_v<T, int> || std::is_same_v<T, long long int>, bool> = true>
1340+ many_api_layout_args<T> make_valid_layout_args_for_plan_many()
1341+ {
1342+ many_api_layout_args<T> ret;
1343+ if constexpr(std::is_same_v<T, int>)
1344+ {
1345+ ret.input_embed = int_inembed.data();
1346+ ret.output_embed = int_onembed.data();
1347+ }
1348+ else
1349+ {
1350+ ret.input_embed = ll_inembed.data();
1351+ ret.output_embed = ll_onembed.data();
1352+ }
1353+ ret.input_stride = static_cast<T>(istride.back());
1354+ ret.output_stride = static_cast<T>(ostride.back());
1355+ ret.input_distance = static_cast<T>(idist);
1356+ ret.output_distance = static_cast<T>(odist);
1357+ if(is_using_default_layout())
1358+ {
1359+ // If using a default layout, users can
1360+ // (A) either set explicitly inembed, onembed, strides, and distances (like above);
1361+ // (B) or use nullptr as arguments for inembed and onembed. Strides and
1362+ // distances are supposed to be ignored in that case.
1363+ // --> choose randomly between either valid usage when a default layout is
1364+ // used, so that all possible valid use case scenarios are considered.
1365+ const std::string test_token = token();
1366+ int randomizer;
1367+ set_with_random_nonnegative_values(test_token, randomizer);
1368+ if(randomizer % 2 == 0)
1369+ {
1370+ ret.input_embed = nullptr;
1371+ ret.output_embed = nullptr;
1372+ // FIXME: negative values are not truly ignored for now.
1373+ set_with_random_nonnegative_values(test_token,
1374+ ret.input_stride,
1375+ ret.output_stride,
1376+ ret.input_distance,
1377+ ret.output_distance);
1378+ }
1379+ }
1380+ return ret;
1381+ }
1382+
1383 // Not all plan options work with all creation types. Return a
1384 // suitable plan creation type for the current FFT parameters.
1385 int get_create_type()
1386@@ -996,15 +1424,16 @@ private:
1387 }
1388 hipfftResult_t create_plan_many()
1389 {
1390- auto ret = hipfftPlanMany(&plan,
1391+ auto layout_args = make_valid_layout_args_for_plan_many<int>();
1392+ auto ret = hipfftPlanMany(&plan,
1393 dim(),
1394 int_length.data(),
1395- int_inembed.data(),
1396- istride.back(),
1397- idist,
1398- int_onembed.data(),
1399- ostride.back(),
1400- odist,
1401+ layout_args.input_embed,
1402+ layout_args.input_stride,
1403+ layout_args.input_distance,
1404+ layout_args.output_embed,
1405+ layout_args.output_stride,
1406+ layout_args.output_distance,
1407 *hipfft_transform_type,
1408 nbatch);
1409 return ret;
1410@@ -1014,7 +1443,14 @@ private:
1411 // relevant pre-Make APIs (scale factor, XtSetGPUs)
1412 hipfftResult_t create_with_pre_make()
1413 {
1414- auto ret = hipfftCreate(&plan);
1415+ hipfftResult_t ret{HIPFFT_INVALID_PLAN};
1416+ if(auto_allocate == fft_auto_allocation_off)
1417+ {
1418+ ret = check_worksize_estimate(); // read worksize estimate before plan creation
1419+ if(ret != HIPFFT_SUCCESS)
1420+ return ret;
1421+ }
1422+ ret = hipfftCreate(&plan);
1423 if(ret != HIPFFT_SUCCESS)
1424 return ret;
1425 if(scale_factor != 1.0)
1426@@ -1026,7 +1462,8 @@ private:
1427 if(multiGPU > 1)
1428 {
1429 int deviceCount = 0;
1430- (void)hipGetDeviceCount(&deviceCount);
1431+ if(hipGetDeviceCount(&deviceCount) != hipSuccess)
1432+ throw std::runtime_error("hipGetDeviceCount failed");
1433
1434 // ensure that users request less than or equal to the total number of devices
1435 if(static_cast<int>(multiGPU) > deviceCount)
1436@@ -1035,9 +1472,8 @@ private:
1437 std::vector<int> GPUs(multiGPU);
1438 std::iota(GPUs.begin(), GPUs.end(), 0);
1439 ret = hipfftXtSetGPUs(plan, static_cast<int>(multiGPU), GPUs.data());
1440-
1441- xt_worksize.resize(GPUs.size());
1442- workbuffersize_ptr = xt_worksize.data();
1443+ if(ret != HIPFFT_SUCCESS)
1444+ return ret;
1445 }
1446 if(mp_lib == fft_mp_lib_mpi)
1447 {
1448@@ -1090,34 +1526,52 @@ private:
1449 input_stride.data(),
1450 output_stride.data());
1451 }
1452+ if(ret != HIPFFT_SUCCESS)
1453+ return ret;
1454 #else
1455 throw std::runtime_error("MPI is not enabled");
1456 #endif
1457 }
1458+ if(auto_allocate == fft_auto_allocation_off)
1459+ {
1460+ ret = check_worksize_estimate(); // read worksize estimate again after plan creation
1461+ if(ret != HIPFFT_SUCCESS)
1462+ return ret;
1463+ }
1464+ if(is_preventing_auto_allocation_at_generation())
1465+ {
1466+ ret = hipfftSetAutoAllocation(plan, 0);
1467+ }
1468+
1469 return ret;
1470 }
1471
1472 hipfftResult_t create_make_plan_Nd()
1473 {
1474 auto ret = create_with_pre_make();
1475+
1476 if(ret != HIPFFT_SUCCESS)
1477 return ret;
1478-
1479+ // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented
1480+ std::vector<size_t> tmp_worksize(get_num_used_gpus());
1481+ size_t* worksize_ptr = is_preventing_auto_allocation_at_generation()
1482+ ? tmp_worksize.data()
1483+ : auto_allocated_worksizes.data();
1484 switch(dim())
1485 {
1486 case 1:
1487 return hipfftMakePlan1d(
1488- plan, int_length[0], *hipfft_transform_type, nbatch, workbuffersize_ptr);
1489+ plan, int_length[0], *hipfft_transform_type, nbatch, worksize_ptr);
1490 case 2:
1491 return hipfftMakePlan2d(
1492- plan, int_length[0], int_length[1], *hipfft_transform_type, workbuffersize_ptr);
1493+ plan, int_length[0], int_length[1], *hipfft_transform_type, worksize_ptr);
1494 case 3:
1495 return hipfftMakePlan3d(plan,
1496 int_length[0],
1497 int_length[1],
1498 int_length[2],
1499 *hipfft_transform_type,
1500- workbuffersize_ptr);
1501+ worksize_ptr);
1502 default:
1503 throw std::runtime_error("invalid dim");
1504 }
1505@@ -1128,18 +1582,24 @@ private:
1506 auto ret = create_with_pre_make();
1507 if(ret != HIPFFT_SUCCESS)
1508 return ret;
1509+ // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented
1510+ std::vector<size_t> tmp_worksize(get_num_used_gpus());
1511+ size_t* worksize_ptr = is_preventing_auto_allocation_at_generation()
1512+ ? tmp_worksize.data()
1513+ : auto_allocated_worksizes.data();
1514+ auto layout_args = make_valid_layout_args_for_plan_many<int>();
1515 return hipfftMakePlanMany(plan,
1516 dim(),
1517 int_length.data(),
1518- int_inembed.data(),
1519- istride.back(),
1520- idist,
1521- int_onembed.data(),
1522- ostride.back(),
1523- odist,
1524+ layout_args.input_embed,
1525+ layout_args.input_stride,
1526+ layout_args.input_distance,
1527+ layout_args.output_embed,
1528+ layout_args.output_stride,
1529+ layout_args.output_distance,
1530 *hipfft_transform_type,
1531 nbatch,
1532- workbuffersize_ptr);
1533+ worksize_ptr);
1534 }
1535
1536 hipfftResult_t create_make_plan_many64()
1537@@ -1147,58 +1607,85 @@ private:
1538 auto ret = create_with_pre_make();
1539 if(ret != HIPFFT_SUCCESS)
1540 return ret;
1541+
1542+ // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented
1543+ std::vector<size_t> tmp_worksize(get_num_used_gpus());
1544+ size_t* worksize_ptr = is_preventing_auto_allocation_at_generation()
1545+ ? tmp_worksize.data()
1546+ : auto_allocated_worksizes.data();
1547+ auto layout_args = make_valid_layout_args_for_plan_many<long long int>();
1548 return hipfftMakePlanMany64(plan,
1549 dim(),
1550 ll_length.data(),
1551- ll_inembed.data(),
1552- istride.back(),
1553- idist,
1554- ll_onembed.data(),
1555- ostride.back(),
1556- odist,
1557+ layout_args.input_embed,
1558+ layout_args.input_stride,
1559+ layout_args.input_distance,
1560+ layout_args.output_embed,
1561+ layout_args.output_stride,
1562+ layout_args.output_distance,
1563 *hipfft_transform_type,
1564 nbatch,
1565- workbuffersize_ptr);
1566+ worksize_ptr);
1567 }
1568
1569- hipfftResult_t create_xt_make_plan_many()
1570+ hipDataType get_xt_api_execution_type() const
1571 {
1572- auto ret = create_with_pre_make();
1573- if(ret != HIPFFT_SUCCESS)
1574- return ret;
1575-
1576 // execution type is always complex, matching the precision
1577 // of the transform
1578 // Initializing as double by default
1579- hipDataType executionType = HIP_C_64F;
1580+ hipDataType ret = HIP_C_64F;
1581 switch(precision)
1582 {
1583 case fft_precision_half:
1584- executionType = HIP_C_16F;
1585+ ret = HIP_C_16F;
1586 break;
1587 case fft_precision_single:
1588- executionType = HIP_C_32F;
1589+ ret = HIP_C_32F;
1590 break;
1591 case fft_precision_double:
1592- executionType = HIP_C_64F;
1593+ ret = HIP_C_64F;
1594 break;
1595+ default:
1596+ throw std::runtime_error("Invalid precision");
1597 }
1598+ return ret;
1599+ }
1600
1601+ hipfftResult_t create_xt_make_plan_many()
1602+ {
1603+ auto ret = create_with_pre_make();
1604+ if(ret != HIPFFT_SUCCESS)
1605+ return ret;
1606+
1607+ // do not register plan's worksizes as "auto-allocated" if auto-allocation was explicitly prevented
1608+ std::vector<size_t> tmp_worksize(get_num_used_gpus());
1609+ size_t* worksize_ptr = is_preventing_auto_allocation_at_generation()
1610+ ? tmp_worksize.data()
1611+ : auto_allocated_worksizes.data();
1612+ auto executionType = get_xt_api_execution_type();
1613+ auto layout_args = make_valid_layout_args_for_plan_many<long long int>();
1614 return hipfftXtMakePlanMany(plan,
1615 dim(),
1616 ll_length.data(),
1617- ll_inembed.data(),
1618- istride.back(),
1619- idist,
1620+ layout_args.input_embed,
1621+ layout_args.input_stride,
1622+ layout_args.input_distance,
1623 inputType,
1624- ll_onembed.data(),
1625- ostride.back(),
1626- odist,
1627+ layout_args.output_embed,
1628+ layout_args.output_stride,
1629+ layout_args.output_distance,
1630 outputType,
1631 nbatch,
1632- workbuffersize_ptr,
1633+ worksize_ptr,
1634 executionType);
1635 }
1636+ static constexpr size_t absurd_init_worksize_estimate = std::numeric_limits<size_t>::max();
1637+ bool final_attempt_at_plan_creation = false;
1638+
1639+ size_t get_num_used_gpus() const
1640+ {
1641+ return multiGPU > 1 ? multiGPU : 1;
1642+ };
1643 };
1644
1645 #endif
1646diff --git a/clients/hipfftw_helper.h b/clients/hipfftw_helper.h
1647new file mode 100644
1648index 0000000..3e86134
1649--- /dev/null
1650+++ b/clients/hipfftw_helper.h
1651@@ -0,0 +1,1238 @@
1652+// Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved.
1653+//
1654+// Permission is hereby granted, free of charge, to any person obtaining a copy
1655+// of this software and associated documentation files (the "Software"), to deal
1656+// in the Software without restriction, including without limitation the rights
1657+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
1658+// copies of the Software, and to permit persons to whom the Software is
1659+// furnished to do so, subject to the following conditions:
1660+//
1661+// The above copyright notice and this permission notice shall be included in
1662+// all copies or substantial portions of the Software.
1663+//
1664+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
1665+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
1666+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
1667+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1668+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
1669+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
1670+// THE SOFTWARE.
1671+
1672+#ifndef HIPFFTW_HELPER_H
1673+#define HIPFFTW_HELPER_H
1674+
1675+#include "../shared/environment.h"
1676+#include "../shared/fft_params.h"
1677+#include <algorithm>
1678+#include <fftw3.h>
1679+#include <memory>
1680+#include <sstream>
1681+#include <stdexcept>
1682+#include <string>
1683+#include <type_traits>
1684+
1685+#ifdef WIN32
1686+#include <windows.h>
1687+// psapi.h requires windows.h to be included first
1688+#include <psapi.h>
1689+typedef HMODULE LIB_HANDLE_T;
1690+#else
1691+#include <dlfcn.h>
1692+#include <link.h>
1693+typedef void* LIB_HANDLE_T;
1694+#endif
1695+
1696+template <fft_precision prec>
1697+struct hipfftw_trait;
1698+template <>
1699+struct hipfftw_trait<fft_precision_single>
1700+{
1701+ using plan_t = fftwf_plan;
1702+ using complex_t = fftwf_complex;
1703+ using real_t = float;
1704+};
1705+template <>
1706+struct hipfftw_trait<fft_precision_double>
1707+{
1708+ using plan_t = fftw_plan;
1709+ using complex_t = fftw_complex;
1710+ using real_t = double;
1711+};
1712+
1713+template <fft_precision prec>
1714+using hipfftw_real_t = typename hipfftw_trait<prec>::real_t;
1715+template <fft_precision prec>
1716+using hipfftw_complex_t = typename hipfftw_trait<prec>::complex_t;
1717+template <fft_precision prec>
1718+using hipfftw_plan_t = typename hipfftw_trait<prec>::plan_t;
1719+
1720+// singleton class encapsulating the dynamically-loaded hipfftw library
1721+class dynamically_loaded_hipfftw
1722+{
1723+private:
1724+ LIB_HANDLE_T lib_handle;
1725+ std::ostringstream load_error_info;
1726+
1727+ dynamically_loaded_hipfftw()
1728+ {
1729+#ifdef __HIP_PLATFORM_AMD__
1730+ const std::string lib_basename = "hipfftw";
1731+#else
1732+ const std::string lib_basename = "cufftw";
1733+#endif
1734+#ifdef WIN32
1735+ const std::string lib_fullame = lib_basename + ".dll";
1736+ lib_handle = LoadLibraryA(lib_fullame.c_str());
1737+#else
1738+ const std::string lib_fullame = "lib" + lib_basename + ".so";
1739+ lib_handle = dlopen(lib_fullame.c_str(), RTLD_LAZY);
1740+#endif
1741+ load_error_info.clear();
1742+ if(!lib_handle)
1743+ {
1744+ load_error_info << "failed to open library " << lib_fullame;
1745+#ifdef WIN32
1746+ load_error_info << ". System's error code = " << GetLastError();
1747+#else
1748+ load_error_info << ". System's error message = " << dlerror();
1749+#endif
1750+ // do not throw from here to ease exception handling
1751+ }
1752+ }
1753+ /* disable copies and moves */
1754+ dynamically_loaded_hipfftw(const dynamically_loaded_hipfftw&) = delete;
1755+ dynamically_loaded_hipfftw(dynamically_loaded_hipfftw&&) = delete;
1756+ dynamically_loaded_hipfftw& operator=(const dynamically_loaded_hipfftw&) = delete;
1757+ dynamically_loaded_hipfftw& operator=(dynamically_loaded_hipfftw&&) = delete;
1758+
1759+ static const dynamically_loaded_hipfftw& get_instance()
1760+ {
1761+ static dynamically_loaded_hipfftw singleton_instance;
1762+ return singleton_instance;
1763+ }
1764+
1765+public:
1766+ static LIB_HANDLE_T get_lib()
1767+ {
1768+ return get_instance().lib_handle;
1769+ }
1770+ static std::string get_load_error_info()
1771+ {
1772+ return get_instance().load_error_info.str();
1773+ }
1774+ ~dynamically_loaded_hipfftw()
1775+ {
1776+ if(lib_handle)
1777+ {
1778+#ifdef WIN32
1779+ (void)FreeLibrary(lib_handle);
1780+#else
1781+ (void)dlclose(lib_handle);
1782+#endif
1783+ }
1784+ lib_handle = nullptr;
1785+ }
1786+};
1787+
1788+// exception specific to issues when loading hipfftw and/or when fetching
1789+// the address of the supposedly-available functions therefrom
1790+struct hipfftw_undefined_function_ptr : std::runtime_error
1791+{
1792+ using std::runtime_error::runtime_error;
1793+};
1794+
1795+// helper struct for retrieving a function's return type
1796+template <class T>
1797+struct func_ret;
1798+template <typename R, class... Args>
1799+struct func_ret<R(Args...)>
1800+{
1801+ using type = R;
1802+};
1803+template <class T>
1804+using func_ret_t = typename func_ret<T>::type;
1805+
1806+template <typename func_type, std::enable_if_t<std::is_function_v<func_type>, bool> = true>
1807+struct dynamically_loaded_function_t
1808+{
1809+private:
1810+ // address of the desired function, to be fetched from a dynamically loaded shared library
1811+ func_type* func_ptr;
1812+ // address of the reference function (from linked fftw3)
1813+ func_type* const reference_func_ptr;
1814+ // symbol of said function
1815+ std::string func_symbol;
1816+
1817+public:
1818+ dynamically_loaded_function_t(const char* symbol, func_type* ref_func_address)
1819+ : func_ptr(nullptr)
1820+ , reference_func_ptr(ref_func_address)
1821+ , func_symbol(symbol)
1822+ {
1823+ }
1824+
1825+ // forwarding functional calls
1826+ template <typename... Args>
1827+ func_ret_t<func_type> operator()(Args... args) const
1828+ {
1829+ if(!may_be_used())
1830+ throw hipfftw_undefined_function_ptr(dynamically_loaded_hipfftw::get_load_error_info());
1831+ return func_ptr(args...);
1832+ }
1833+ template <bool call_reference, typename... Args>
1834+ func_ret_t<func_type> call(Args... args) const
1835+ {
1836+ if constexpr(!call_reference)
1837+ {
1838+ return this->operator()(args...);
1839+ }
1840+ else
1841+ {
1842+ if(!reference_func_ptr)
1843+ throw hipfftw_undefined_function_ptr(
1844+ "Ill-defined reference function pointer for symbol " + func_symbol);
1845+ return reference_func_ptr(args...);
1846+ }
1847+ // unreachable
1848+ }
1849+ void load_implementation()
1850+ {
1851+ const auto hipfftw_lib = dynamically_loaded_hipfftw::get_lib();
1852+ if(!hipfftw_lib)
1853+ {
1854+ // make func_ptr unambiguously unset to force the dedicated exception
1855+ // to be thrown at forwarded functional call(s)
1856+ func_ptr = nullptr;
1857+ return;
1858+ }
1859+#ifdef WIN32
1860+ func_ptr = reinterpret_cast<func_type*>(GetProcAddress(hipfftw_lib, func_symbol.c_str()));
1861+#else
1862+ func_ptr = reinterpret_cast<func_type*>(dlsym(hipfftw_lib, func_symbol.c_str()));
1863+#endif
1864+ }
1865+ bool may_be_used() const
1866+ {
1867+ return func_ptr != nullptr;
1868+ }
1869+ std::string get_symbol() const
1870+ {
1871+ return func_symbol;
1872+ }
1873+};
1874+
1875+template <typename T, typename... Args>
1876+static void load_implementations(dynamically_loaded_function_t<T>& first, Args&... others)
1877+{
1878+ first.load_implementation();
1879+ if constexpr(sizeof...(others) > 0)
1880+ load_implementations(others...);
1881+}
1882+
1883+// define singleton structures encapsulating all the hipfftw function
1884+// pointers (one specialization per supported precision)
1885+template <fft_precision prec>
1886+struct hipfftw_funcs;
1887+
1888+#define HIPFFTW_STRINGIFY(x) #x
1889+#define HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, func) \
1890+ dynamically_loaded_function_t<decltype(prefix##func)> func \
1891+ = dynamically_loaded_function_t<decltype(prefix##func)>(HIPFFTW_STRINGIFY(prefix##func), \
1892+ &(prefix##func));
1893+
1894+#define HIPFFTW_FUNCS_SPECIALIZATION(prefix, specialization) \
1895+ template <> \
1896+ struct hipfftw_funcs<specialization> \
1897+ { \
1898+ private: \
1899+ hipfftw_funcs() \
1900+ { \
1901+ load_implementations(malloc, \
1902+ alloc_real, \
1903+ alloc_complex, \
1904+ free, \
1905+ destroy_plan, \
1906+ cleanup, \
1907+ execute, \
1908+ plan_dft_1d, \
1909+ plan_dft_2d, \
1910+ plan_dft_3d, \
1911+ plan_dft, \
1912+ plan_dft_r2c_1d, \
1913+ plan_dft_r2c_2d, \
1914+ plan_dft_r2c_3d, \
1915+ plan_dft_r2c, \
1916+ plan_dft_c2r_1d, \
1917+ plan_dft_c2r_2d, \
1918+ plan_dft_c2r_3d, \
1919+ plan_dft_c2r, \
1920+ print_plan, \
1921+ set_timelimit, \
1922+ cost, \
1923+ flops); \
1924+ } \
1925+ /* disable copies and moves */ \
1926+ hipfftw_funcs(const hipfftw_funcs&) = delete; \
1927+ hipfftw_funcs& operator=(const hipfftw_funcs&) = delete; \
1928+ hipfftw_funcs(hipfftw_funcs&&) = delete; \
1929+ hipfftw_funcs& operator=(hipfftw_funcs&&) = delete; \
1930+ \
1931+ public: \
1932+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, malloc) \
1933+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, alloc_real) \
1934+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, alloc_complex) \
1935+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, free) \
1936+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, destroy_plan) \
1937+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, cleanup) \
1938+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, execute) \
1939+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_1d) \
1940+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_2d) \
1941+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_3d) \
1942+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft) \
1943+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c_1d) \
1944+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c_2d) \
1945+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c_3d) \
1946+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_r2c) \
1947+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r_1d) \
1948+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r_2d) \
1949+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r_3d) \
1950+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, plan_dft_c2r) \
1951+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, print_plan) \
1952+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, set_timelimit) \
1953+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, cost) \
1954+ HIPFFTW_DECLARE_DYNAMICALLY_LOADED_FUNCTION_POINTER(prefix, flops) \
1955+ static const hipfftw_funcs& get_instance() \
1956+ { \
1957+ static const hipfftw_funcs instance; \
1958+ return instance; \
1959+ } \
1960+ }
1961+
1962+HIPFFTW_FUNCS_SPECIALIZATION(fftwf_, fft_precision_single);
1963+HIPFFTW_FUNCS_SPECIALIZATION(fftw_, fft_precision_double);
1964+
1965+// structure enabling verbosity for hipfftw's exception handler and redirecting std::cerr
1966+// to a runtime buffer throughout its lifetime (unless it was already enabled prior/externally)
1967+struct hipfftw_exception_logger
1968+{
1969+ bool active;
1970+ std::stringstream buffer;
1971+ std::streambuf* const original_cerr_rdbuf = nullptr;
1972+
1973+ std::unique_ptr<EnvironmentSetTemp> hipfftw_temp_logger_env;
1974+
1975+public:
1976+ hipfftw_exception_logger()
1977+ : active(false)
1978+ , original_cerr_rdbuf(std::cerr.rdbuf())
1979+ {
1980+#ifdef __HIP_PLATFORM_AMD__
1981+ const auto env_val = rocfft_getenv("HIPFFTW_LOG_EXCEPTIONS");
1982+ // activate temporary redirection only if not already used otherwise
1983+ // (e.g., in test user's environment )
1984+ if(env_val.empty() || std::stoull(env_val) == 0)
1985+ {
1986+ hipfftw_temp_logger_env
1987+ = std::make_unique<EnvironmentSetTemp>("HIPFFTW_LOG_EXCEPTIONS", "1");
1988+ const auto temp_env_val = rocfft_getenv("HIPFFTW_LOG_EXCEPTIONS");
1989+ active = !temp_env_val.empty() && std::stoull(temp_env_val) != 0;
1990+ }
1991+#endif
1992+ if(active)
1993+ std::cerr.rdbuf(buffer.rdbuf());
1994+ }
1995+ hipfftw_exception_logger(const hipfftw_exception_logger&) = delete;
1996+ hipfftw_exception_logger(hipfftw_exception_logger&&) = delete;
1997+ hipfftw_exception_logger& operator=(const hipfftw_exception_logger&) = delete;
1998+ hipfftw_exception_logger& operator=(hipfftw_exception_logger&&) = delete;
1999+ ~hipfftw_exception_logger()
2000+ {
2001+ if(active)
2002+ {
2003+ // restore cerr to its original state
2004+ std::cerr.rdbuf(original_cerr_rdbuf);
2005+ }
2006+ }
2007+ bool is_active() const
2008+ {
2009+ return active;
2010+ }
2011+ std::string get_log() const
2012+ {
2013+ return buffer.str();
2014+ }
2015+};
2016+
2017+// bit-flagging enum used for representing (combinations of) plan creation
2018+// function(s) to consider
2019+enum hipfftw_plan_creation_func : unsigned
2020+{
2021+ NONE = 0x0, // not to be used (exceptfor validating values)
2022+ PLAN_DFT_ND = 0x1 << 0,
2023+ PLAN_DFT = 0x1 << 1,
2024+ PLAN_MANY = 0x1 << 2,
2025+ PLAN_GURU = 0x1 << 3,
2026+ PLAN_GURU64 = 0x1 << 4,
2027+ ANY = PLAN_DFT_ND | PLAN_DFT | PLAN_MANY | PLAN_GURU | PLAN_GURU64
2028+};
2029+static const std::vector<hipfftw_plan_creation_func> hipfftw_plan_creation_func_candidates
2030+ = {hipfftw_plan_creation_func::PLAN_DFT_ND,
2031+ hipfftw_plan_creation_func::PLAN_DFT,
2032+ hipfftw_plan_creation_func::PLAN_MANY,
2033+ hipfftw_plan_creation_func::PLAN_GURU,
2034+ hipfftw_plan_creation_func::PLAN_GURU64};
2035+
2036+static bool hipfftw_creation_options_are_well_defined(hipfftw_plan_creation_func creation_options)
2037+{
2038+ return creation_options == (creation_options & hipfftw_plan_creation_func::ANY);
2039+}
2040+
2041+static std::string hipfftw_creation_options_to_string(hipfftw_plan_creation_func creation_options,
2042+ fft_transform_type dft_type,
2043+ int intended_rank)
2044+{
2045+ if(!hipfftw_creation_options_are_well_defined(creation_options))
2046+ throw std::invalid_argument(
2047+ "invalid creation_options for hipfftw_creation_options_to_string");
2048+ if(creation_options == hipfftw_plan_creation_func::NONE)
2049+ return "none";
2050+ if(creation_options == hipfftw_plan_creation_func::ANY)
2051+ return "any";
2052+ if(std::find(hipfftw_plan_creation_func_candidates.begin(),
2053+ hipfftw_plan_creation_func_candidates.end(),
2054+ creation_options)
2055+ == hipfftw_plan_creation_func_candidates.end())
2056+ {
2057+ // 2 or more qualifying candidates flagged in creation_options
2058+ std::string ret;
2059+ for(auto candidate : hipfftw_plan_creation_func_candidates)
2060+ {
2061+ if(creation_options & candidate)
2062+ {
2063+ if(!ret.empty())
2064+ ret += "_or_";
2065+ ret += hipfftw_creation_options_to_string(candidate, dft_type, intended_rank);
2066+ }
2067+ }
2068+ return ret;
2069+ }
2070+ // creation_options is one unique qualifying candidate
2071+ std::ostringstream ret;
2072+ const std::string real_or_empty_qualifier
2073+ = is_real(dft_type) ? (is_fwd(dft_type) ? "_r2c" : "_c2r") : "";
2074+ switch(creation_options)
2075+ {
2076+ case hipfftw_plan_creation_func::PLAN_DFT_ND:
2077+ ret << "plan_dft" << real_or_empty_qualifier << "_" << (intended_rank < 0 ? "negative" : "")
2078+ << std::abs(intended_rank) << "d";
2079+ break;
2080+ case hipfftw_plan_creation_func::PLAN_DFT:
2081+ ret << "plan_dft" << real_or_empty_qualifier;
2082+ break;
2083+ case hipfftw_plan_creation_func::PLAN_MANY:
2084+ ret << "plan_many_dft" << real_or_empty_qualifier;
2085+ break;
2086+ case hipfftw_plan_creation_func::PLAN_GURU:
2087+ ret << "plan_guru_dft" << real_or_empty_qualifier;
2088+ break;
2089+ case hipfftw_plan_creation_func::PLAN_GURU64:
2090+ ret << "plan_guru64_dft" << real_or_empty_qualifier;
2091+ break;
2092+ default:
2093+ throw std::runtime_error("hipfftw_creation_options_to_string: internal error encountered "
2094+ "(unexpected value for creation_options)");
2095+ break;
2096+ }
2097+ return ret.str();
2098+}
2099+
2100+template <
2101+ fft_precision prec,
2102+ std::enable_if_t<prec == fft_precision_single || prec == fft_precision_double, bool> = true>
2103+struct hipfftw_plan_bundle_t
2104+{
2105+private:
2106+ const decltype(hipfftw_funcs<prec>::destroy_plan)& plan_destructor;
2107+
2108+public:
2109+ hipfftw_plan_t<prec> plan;
2110+ std::pair<void*, void*> creation_io; // not owned
2111+ hipfftw_plan_creation_func creation_func;
2112+ std::string plan_token; // <-- plan details, except for creation io data pointers
2113+ hipfftw_plan_bundle_t(decltype(plan_destructor) plan_destructor_func)
2114+ : plan_destructor(plan_destructor_func)
2115+ , plan(nullptr)
2116+ , creation_io({nullptr, nullptr})
2117+ , creation_func(hipfftw_plan_creation_func::NONE)
2118+ , plan_token("")
2119+ {
2120+ }
2121+ ~hipfftw_plan_bundle_t()
2122+ {
2123+ // make sure the plan destructor may be used to avoid
2124+ // throwing from the hipfftw_plan_bundle_t destructor
2125+ if(plan_destructor.may_be_used())
2126+ {
2127+ // should be stable even if plan == nullptr;
2128+ plan_destructor(plan);
2129+ }
2130+ else if(plan)
2131+ {
2132+ std::cerr << "WARNING: A " << (prec == fft_precision_single ? "single" : "double")
2133+ << "-precision plan was seemingly created but its destructor cannot be used "
2134+ << std::endl;
2135+ }
2136+ }
2137+ // disable copies and moves
2138+ hipfftw_plan_bundle_t(const hipfftw_plan_bundle_t&) = delete;
2139+ hipfftw_plan_bundle_t& operator=(const hipfftw_plan_bundle_t&) = delete;
2140+ hipfftw_plan_bundle_t(hipfftw_plan_bundle_t&&) = delete;
2141+ hipfftw_plan_bundle_t& operator=(hipfftw_plan_bundle_t&&) = delete;
2142+};
2143+
2144+static bool rank_is_valid_for_hipfftw(int r)
2145+{
2146+ return r > 0;
2147+}
2148+template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
2149+static bool lengths_are_valid_for_hipfftw_as(const std::vector<ptrdiff_t> len, int intended_rank)
2150+{
2151+ if(!rank_is_valid_for_hipfftw(intended_rank))
2152+ return false; // impossible to validate lengths for an invalid rank
2153+ // check that lengths are all strictly positive and representable with
2154+ // type T without data loss
2155+ return len.size() == intended_rank
2156+ && std::all_of(len.begin(), len.end(), [](const decltype(len)::value_type& val) {
2157+ return val > 0 && val <= std::numeric_limits<T>::max();
2158+ });
2159+}
2160+static bool sign_is_valid_for_hipfftw(int s, const fft_transform_type& dft_kind)
2161+{
2162+ if(is_real(dft_kind))
2163+ return true; // sign is irrelevant for real transforms
2164+ return s == (is_fwd(dft_kind) ? FFTW_FORWARD : FFTW_BACKWARD);
2165+}
2166+static constexpr unsigned hipfftw_valid_flags_mask
2167+ = FFTW_WISDOM_ONLY | FFTW_MEASURE | FFTW_DESTROY_INPUT | FFTW_UNALIGNED | FFTW_CONSERVE_MEMORY
2168+ | FFTW_EXHAUSTIVE | FFTW_PRESERVE_INPUT | FFTW_PATIENT | FFTW_ESTIMATE;
2169+static bool flags_are_valid_for_hipfftw(unsigned f)
2170+{
2171+ return (f & hipfftw_valid_flags_mask) == f;
2172+}
2173+
2174+template <
2175+ fft_precision prec,
2176+ std::enable_if_t<prec == fft_precision_single || prec == fft_precision_double, bool> = true>
2177+struct hipfftw_helper
2178+{
2179+private:
2180+ // plan_bundle stores information about the latest plan possibly created by this
2181+ // object. A shard_ptr is used to make hipfftw_helper safe w.r.t. shallow
2182+ // copies (as required by gtest for parameterized tests).
2183+ // This member is also made mutable so we can release/create it even from a
2184+ // const-qualified objects (e.g., to release owned resources upon test completion,
2185+ // or to re-create the plan at execution if needed or found necessary)
2186+ mutable std::shared_ptr<hipfftw_plan_bundle_t<prec>> plan_bundle;
2187+
2188+ fft_transform_type dft_kind;
2189+ int rank = 0;
2190+ std::vector<ptrdiff_t> lengths;
2191+ fft_result_placement plan_placement;
2192+ int sign = 0;
2193+ unsigned flags = std::numeric_limits<unsigned>::max();
2194+
2195+ template <typename T>
2196+ void reset_member_value(T& member, const T& new_value)
2197+ {
2198+ if(new_value != member)
2199+ {
2200+ member = new_value;
2201+ plan_bundle.reset();
2202+ }
2203+ }
2204+
2205+ hipfftw_plan_creation_func get_creation_func(hipfftw_plan_creation_func creation_options) const
2206+ {
2207+ if(!hipfftw_creation_options_are_well_defined(creation_options))
2208+ throw std::invalid_argument("invalid creation_options for get_creation_func");
2209+ if(!can_use_creation_options(creation_options))
2210+ {
2211+ // e.g., rank < 0 with creation_options == hipfftw_plan_creation_func::PLAN_DFT_ND
2212+ throw std::invalid_argument(
2213+ "The plan creation options "
2214+ + hipfftw_creation_options_to_string(creation_options, dft_kind, rank)
2215+ + " cannot be used with this object");
2216+ }
2217+ std::vector<hipfftw_plan_creation_func> valid_candidates;
2218+ for(auto candidate : hipfftw_plan_creation_func_candidates)
2219+ {
2220+ if(!(creation_options & candidate))
2221+ continue; // candidate is not in given creation_options
2222+ if(can_use_creation_options(candidate))
2223+ {
2224+ // If creation_options != candidate for all candidates, creation_optionsactually
2225+ // combines 2 or more candidates --> only the candidates actually supporting plan
2226+ // creation will be considered "valid". If there exists one (usable) candidate s.t.
2227+ // creation_options == candidate however, this choice is considered "enforced"
2228+ // (e.g. for function-specific argument validation testing purposes)
2229+ if(creation_options == candidate || can_create_plan_with(candidate))
2230+ valid_candidates.push_back(candidate);
2231+ }
2232+ }
2233+ if(valid_candidates.empty())
2234+ return hipfftw_plan_creation_func::NONE;
2235+ // "randomly" (yet reproducibly) choose
2236+ return valid_candidates[std::hash<std::string>()(token()) % valid_candidates.size()];
2237+ }
2238+
2239+ template <bool make_reference_plan = false>
2240+ hipfftw_plan_t<prec>
2241+ make_plan(void* in, void* out, hipfftw_plan_creation_func chosen_creation) const
2242+ {
2243+ if(std::find(hipfftw_plan_creation_func_candidates.begin(),
2244+ hipfftw_plan_creation_func_candidates.end(),
2245+ chosen_creation)
2246+ == hipfftw_plan_creation_func_candidates.end())
2247+ {
2248+ throw std::invalid_argument("Invalid chosen_creation for hipfftw_helper::make_plan");
2249+ }
2250+
2251+ // fetch/infer plan creation function arguments
2252+ const auto& hipfftw_impl = hipfftw_funcs<prec>::get_instance();
2253+ const auto int_len = get_length_as<int>();
2254+ const int* int_len_ptr = int_len.empty() ? nullptr : int_len.data();
2255+
2256+ switch(chosen_creation)
2257+ {
2258+ case hipfftw_plan_creation_func::PLAN_DFT_ND:
2259+ {
2260+ if(!can_use_creation_options(hipfftw_plan_creation_func::PLAN_DFT_ND))
2261+ throw std::runtime_error("hipfftw_plan_creation_func::PLAN_DFT_ND cannot be used.");
2262+ if(rank == 1)
2263+ {
2264+ if(dft_kind == fft_transform_type_real_forward)
2265+ {
2266+ return hipfftw_impl.plan_dft_r2c_1d.template call<make_reference_plan>(
2267+ int_len_ptr[0],
2268+ static_cast<hipfftw_real_t<prec>*>(in),
2269+ static_cast<hipfftw_complex_t<prec>*>(out),
2270+ flags);
2271+ }
2272+ else if(dft_kind == fft_transform_type_real_inverse)
2273+ {
2274+ return hipfftw_impl.plan_dft_c2r_1d.template call<make_reference_plan>(
2275+ int_len_ptr[0],
2276+ static_cast<hipfftw_complex_t<prec>*>(in),
2277+ static_cast<hipfftw_real_t<prec>*>(out),
2278+ flags);
2279+ }
2280+ else
2281+ {
2282+
2283+ return hipfftw_impl.plan_dft_1d.template call<make_reference_plan>(
2284+ int_len_ptr[0],
2285+ static_cast<hipfftw_complex_t<prec>*>(in),
2286+ static_cast<hipfftw_complex_t<prec>*>(out),
2287+ sign,
2288+ flags);
2289+ }
2290+ }
2291+ else if(rank == 2)
2292+ {
2293+ if(dft_kind == fft_transform_type_real_forward)
2294+ {
2295+ return hipfftw_impl.plan_dft_r2c_2d.template call<make_reference_plan>(
2296+ int_len_ptr[0],
2297+ int_len_ptr[1],
2298+ static_cast<hipfftw_real_t<prec>*>(in),
2299+ static_cast<hipfftw_complex_t<prec>*>(out),
2300+ flags);
2301+ }
2302+ else if(dft_kind == fft_transform_type_real_inverse)
2303+ {
2304+
2305+ return hipfftw_impl.plan_dft_c2r_2d.template call<make_reference_plan>(
2306+ int_len_ptr[0],
2307+ int_len_ptr[1],
2308+ static_cast<hipfftw_complex_t<prec>*>(in),
2309+ static_cast<hipfftw_real_t<prec>*>(out),
2310+ flags);
2311+ }
2312+ else
2313+ {
2314+ return hipfftw_impl.plan_dft_2d.template call<make_reference_plan>(
2315+ int_len_ptr[0],
2316+ int_len_ptr[1],
2317+ static_cast<hipfftw_complex_t<prec>*>(in),
2318+ static_cast<hipfftw_complex_t<prec>*>(out),
2319+ sign,
2320+ flags);
2321+ }
2322+ }
2323+ else
2324+ {
2325+ if(dft_kind == fft_transform_type_real_forward)
2326+ {
2327+ return hipfftw_impl.plan_dft_r2c_3d.template call<make_reference_plan>(
2328+ int_len_ptr[0],
2329+ int_len_ptr[1],
2330+ int_len_ptr[2],
2331+ static_cast<hipfftw_real_t<prec>*>(in),
2332+ static_cast<hipfftw_complex_t<prec>*>(out),
2333+ flags);
2334+ }
2335+ else if(dft_kind == fft_transform_type_real_inverse)
2336+ {
2337+ return hipfftw_impl.plan_dft_c2r_3d.template call<make_reference_plan>(
2338+ int_len_ptr[0],
2339+ int_len_ptr[1],
2340+ int_len_ptr[2],
2341+ static_cast<hipfftw_complex_t<prec>*>(in),
2342+ static_cast<hipfftw_real_t<prec>*>(out),
2343+ flags);
2344+ }
2345+ else
2346+ {
2347+ return hipfftw_impl.plan_dft_3d.template call<make_reference_plan>(
2348+ int_len_ptr[0],
2349+ int_len_ptr[1],
2350+ int_len_ptr[2],
2351+ static_cast<hipfftw_complex_t<prec>*>(in),
2352+ static_cast<hipfftw_complex_t<prec>*>(out),
2353+ sign,
2354+ flags);
2355+ }
2356+ }
2357+ }
2358+ break;
2359+ case hipfftw_plan_creation_func::PLAN_DFT:
2360+ {
2361+ if(!can_use_creation_options(hipfftw_plan_creation_func::PLAN_DFT))
2362+ throw std::runtime_error("hipfftw_plan_creation_func::PLAN_DFT cannot be used.");
2363+
2364+ if(dft_kind == fft_transform_type_real_forward)
2365+ {
2366+ return hipfftw_impl.plan_dft_r2c.template call<make_reference_plan>(
2367+ rank,
2368+ int_len_ptr,
2369+ static_cast<hipfftw_real_t<prec>*>(in),
2370+ static_cast<hipfftw_complex_t<prec>*>(out),
2371+ flags);
2372+ }
2373+ else if(dft_kind == fft_transform_type_real_inverse)
2374+ {
2375+ return hipfftw_impl.plan_dft_c2r.template call<make_reference_plan>(
2376+ rank,
2377+ int_len_ptr,
2378+ static_cast<hipfftw_complex_t<prec>*>(in),
2379+ static_cast<hipfftw_real_t<prec>*>(out),
2380+ flags);
2381+ }
2382+ else
2383+ {
2384+ return hipfftw_impl.plan_dft.template call<make_reference_plan>(
2385+ rank,
2386+ int_len_ptr,
2387+ static_cast<hipfftw_complex_t<prec>*>(in),
2388+ static_cast<hipfftw_complex_t<prec>*>(out),
2389+ sign,
2390+ flags);
2391+ }
2392+ }
2393+ break;
2394+ case hipfftw_plan_creation_func::PLAN_MANY:
2395+ [[fallthrough]];
2396+ case hipfftw_plan_creation_func::PLAN_GURU:
2397+ [[fallthrough]];
2398+ case hipfftw_plan_creation_func::PLAN_GURU64:
2399+ throw std::runtime_error("Enforced plan creation is not implemented yet");
2400+ break;
2401+ default:
2402+ throw std::runtime_error("Unknown kind of plan creation");
2403+ break;
2404+ }
2405+ // unreachable
2406+ }
2407+
2408+public:
2409+ hipfftw_helper() = default;
2410+ ~hipfftw_helper() = default;
2411+ hipfftw_helper(hipfftw_helper&& other) = default;
2412+ hipfftw_helper& operator=(hipfftw_helper&& other) = default;
2413+ hipfftw_helper(const hipfftw_helper& other) = default;
2414+ hipfftw_helper& operator=(const hipfftw_helper& rhs) = default;
2415+
2416+ void set_creation_args(fft_transform_type dft_kind_to_set,
2417+ int rank_to_set,
2418+ const std::vector<ptrdiff_t>& lengths_to_set,
2419+ fft_result_placement placement_to_set,
2420+ int sign_to_set,
2421+ unsigned flags_to_set)
2422+ {
2423+ reset_member_value(dft_kind, dft_kind_to_set);
2424+ reset_member_value(rank, rank_to_set);
2425+ reset_member_value(lengths, lengths_to_set);
2426+ reset_member_value(plan_placement, placement_to_set);
2427+ reset_member_value(sign, sign_to_set);
2428+ reset_member_value(flags, flags_to_set);
2429+ }
2430+ // getters
2431+ fft_transform_type get_dft_kind() const
2432+ {
2433+ return dft_kind;
2434+ }
2435+ int get_rank() const
2436+ {
2437+ return rank;
2438+ }
2439+ // returns the lengths as an std::vector<T> if they may all be safely converted to T
2440+ // (the returned vector is empty otherwise)
2441+ template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
2442+ std::vector<T> get_length_as() const
2443+ {
2444+ if constexpr(std::is_same_v<T, typename decltype(lengths)::value_type>)
2445+ return lengths;
2446+ std::vector<T> ret;
2447+ if(std::any_of(lengths.begin(),
2448+ lengths.end(),
2449+ [](const typename decltype(lengths)::value_type& val) {
2450+ return val < std::numeric_limits<T>::lowest()
2451+ || val > std::numeric_limits<T>::max();
2452+ }))
2453+ {
2454+ // not a safe conversion, return empty lengths
2455+ return ret;
2456+ }
2457+ ret.assign(lengths.begin(), lengths.end());
2458+ return ret;
2459+ }
2460+ fft_result_placement get_placement() const
2461+ {
2462+ return plan_placement;
2463+ }
2464+ int get_sign() const
2465+ {
2466+ return sign;
2467+ }
2468+ unsigned get_flags() const
2469+ {
2470+ return flags;
2471+ }
2472+ std::shared_ptr<hipfftw_plan_bundle_t<prec>> get_plan_bundle() const
2473+ {
2474+ return plan_bundle;
2475+ }
2476+ template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
2477+ std::vector<T> get_strides_as(fft_io io) const
2478+ {
2479+ if(!rank_is_valid_for_hipfftw(rank) || !has_valid_lengths())
2480+ throw std::runtime_error(
2481+ "cannot calculate default strides with invalid rank or invalid lengths");
2482+ // only default strides for now
2483+ std::vector<ptrdiff_t> strides(rank, 1);
2484+ if(rank > 1)
2485+ {
2486+ if(is_complex(dft_kind))
2487+ strides[rank - 2] = lengths.back();
2488+ else
2489+ {
2490+ if(is_fwd(dft_kind) == (io == fft_io::fft_io_out))
2491+ strides[rank - 2] = lengths.back() / 2 + 1;
2492+ else
2493+ {
2494+ if(plan_placement == fft_placement_inplace)
2495+ strides[rank - 2] = 2 * (lengths.back() / 2 + 1);
2496+ else
2497+ strides[rank - 2] = lengths.back();
2498+ }
2499+ }
2500+ }
2501+ for(auto dim = rank - 3; dim >= 0; dim--)
2502+ strides[dim] = strides[dim + 1] * lengths[dim + 1];
2503+
2504+ std::vector<T> ret;
2505+ if(std::any_of(strides.begin(),
2506+ strides.end(),
2507+ [](const typename decltype(strides)::value_type& val) {
2508+ return val < std::numeric_limits<T>::lowest()
2509+ || val > std::numeric_limits<T>::max();
2510+ }))
2511+ {
2512+ // not a safe conversion, return empty lengths
2513+ return ret;
2514+ }
2515+ ret.assign(strides.begin(), strides.end());
2516+ return ret;
2517+ }
2518+ template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
2519+ T get_dist_as(fft_io io) const
2520+ {
2521+ if(!rank_is_valid_for_hipfftw(rank) || !has_valid_lengths())
2522+ throw std::runtime_error(
2523+ "cannot calculate default distance(s) with invalid rank or invalid lengths");
2524+ // only default distances for now
2525+ ptrdiff_t dist = 0;
2526+ if(rank == 1)
2527+ {
2528+ if(is_complex(dft_kind))
2529+ dist = lengths.back();
2530+ else
2531+ {
2532+ if(is_fwd(dft_kind) == (io == fft_io::fft_io_out))
2533+ dist = lengths.back() / 2 + 1;
2534+ else
2535+ {
2536+ if(plan_placement == fft_placement_inplace)
2537+ dist = 2 * (lengths.back() / 2 + 1);
2538+ else
2539+ dist = lengths.back();
2540+ }
2541+ }
2542+ }
2543+ else
2544+ {
2545+ const auto strides = get_strides_as<ptrdiff_t>(io);
2546+ dist = strides.front() * lengths.front();
2547+ }
2548+ if(dist < std::numeric_limits<T>::lowest() || dist > std::numeric_limits<T>::max())
2549+ throw std::runtime_error("distance cannot be safely converted to the desired type");
2550+ return static_cast<T>(dist);
2551+ }
2552+ template <typename T, std::enable_if_t<std::is_integral_v<T>, bool> = true>
2553+ T get_nbatch_as(fft_io io) const
2554+ {
2555+ // only unbatched for now
2556+ T ret = 1;
2557+ return ret;
2558+ }
2559+
2560+ // validity checks
2561+ bool has_valid_rank() const
2562+ {
2563+ return rank_is_valid_for_hipfftw(rank);
2564+ }
2565+ bool has_valid_lengths() const
2566+ {
2567+ return lengths_are_valid_for_hipfftw_as<ptrdiff_t>(lengths, rank);
2568+ }
2569+ bool has_valid_sign() const
2570+ {
2571+ return sign_is_valid_for_hipfftw(sign, dft_kind);
2572+ }
2573+ bool has_valid_flags() const
2574+ {
2575+ return flags_are_valid_for_hipfftw(flags);
2576+ }
2577+ // checks if the current parameters can be used with (any of) the given option(s) of
2578+ // plan creation (NOT whether they're valid or not). For instance, one cannot possibly
2579+ // communicate rank > 3 with hipfftw_plan_creation_func::PLAN_DFT_ND, or communicate
2580+ // non-default strides with hipfftw_plan_creation_func::PLAN_DFT_ND or
2581+ // hipfftw_plan_creation_func::PLAN_DFT...
2582+ // TODO: expand logic when extra configuration parameters are added (e.g. batch sizes,
2583+ // strides, etc.)
2584+ bool can_use_creation_options(hipfftw_plan_creation_func creation_options) const
2585+ {
2586+ if(!hipfftw_creation_options_are_well_defined(creation_options))
2587+ throw std::invalid_argument(
2588+ "ill-defined creation_options used in can_use_creation_options");
2589+ if(creation_options == hipfftw_plan_creation_func::NONE)
2590+ return false;
2591+ if(std::find(hipfftw_plan_creation_func_candidates.begin(),
2592+ hipfftw_plan_creation_func_candidates.end(),
2593+ creation_options)
2594+ == hipfftw_plan_creation_func_candidates.end())
2595+ {
2596+ // creation_options combines several candidates in hipfftw_plan_creation_func_candidates
2597+ // --> parse them individually and find out if any applicable can be used
2598+ return std::any_of(hipfftw_plan_creation_func_candidates.begin(),
2599+ hipfftw_plan_creation_func_candidates.end(),
2600+ [=](const hipfftw_plan_creation_func& candidate) {
2601+ return (creation_options & candidate)
2602+ && can_use_creation_options(candidate);
2603+ });
2604+ }
2605+ // "creation_options" actually is an individual value in hipfftw_plan_creation_func_candidates
2606+ switch(creation_options)
2607+ {
2608+ case hipfftw_plan_creation_func::PLAN_DFT_ND:
2609+ // rank is not passed as an argument but dictated by the called function,
2610+ // (must be 1, 2, or 3), and as many lengths must be passed as individual
2611+ // integer values
2612+ return (rank == 1 || rank == 2 || rank == 3) && get_length_as<int>().size() == rank;
2613+ break;
2614+ case hipfftw_plan_creation_func::PLAN_DFT:
2615+ // the lengths must be representable as integers, if not empty (supposedly
2616+ // intentionally, e.g., for input validation testing purposes)
2617+ return lengths.empty() || get_length_as<int>().size() == rank;
2618+ break;
2619+ case hipfftw_plan_creation_func::PLAN_MANY:
2620+ [[fallthrough]];
2621+ case hipfftw_plan_creation_func::PLAN_GURU:
2622+ [[fallthrough]];
2623+ case hipfftw_plan_creation_func::PLAN_GURU64:
2624+ return false;
2625+ break;
2626+ default:
2627+ throw std::runtime_error("hipfftw_helper: internal error encountered (unexpected value "
2628+ "for creation_options)");
2629+ break;
2630+ }
2631+ // unreachable
2632+ }
2633+
2634+ // checks validity of configuration parameters and whether creation can be
2635+ // attempted via (any of) the given option(s)
2636+ bool is_valid_for_creation_with(hipfftw_plan_creation_func creation_options) const
2637+ {
2638+ if(!hipfftw_creation_options_are_well_defined(creation_options))
2639+ throw std::invalid_argument("invalid creation_options for is_valid_for_creation_with");
2640+
2641+ // TODO: expand the global validity checks below when this struct is
2642+ // expanded to cover more configurations (e.g., batching, srides, etc.)
2643+ return has_valid_rank() && has_valid_lengths() && has_valid_sign() && has_valid_flags()
2644+ && can_use_creation_options(creation_options);
2645+ }
2646+ bool is_valid_for_creation() const
2647+ {
2648+ return is_valid_for_creation_with(hipfftw_plan_creation_func::ANY);
2649+ }
2650+ // check expected support by (any of) the given option(s)
2651+ bool has_unsupported_args_for(hipfftw_plan_creation_func creation_options) const
2652+ {
2653+ // extra conditions for configurations supported by hipfftw:
2654+ if(rank > 3)
2655+ return true;
2656+ if(flags & FFTW_WISDOM_ONLY)
2657+ return true;
2658+ if(dft_kind == fft_transform_type_real_inverse && rank > 1 && (flags & FFTW_PRESERVE_INPUT))
2659+ return true;
2660+ if(!(creation_options & hipfftw_plan_creation_func::PLAN_GURU64) && has_valid_rank()
2661+ && has_valid_lengths())
2662+ {
2663+ // cannot handle data sizes involving more elements than the
2664+ // largest representable int value
2665+ if(get_num_elements_in(fft_io_in) > std::numeric_limits<int>::max()
2666+ || get_num_elements_in(fft_io_out) > std::numeric_limits<int>::max())
2667+ return true;
2668+ }
2669+ return false;
2670+ }
2671+ bool can_create_plan_with(hipfftw_plan_creation_func creation_options) const
2672+ {
2673+ if(!hipfftw_creation_options_are_well_defined(creation_options))
2674+ throw std::invalid_argument("invalid creation_option for can_create_plan_with");
2675+
2676+ if(!is_valid_for_creation_with(creation_options))
2677+ return false;
2678+ if(has_unsupported_args_for(creation_options))
2679+ return false;
2680+ return true;
2681+ }
2682+ bool can_create_plan() const
2683+ {
2684+ return can_create_plan_with(hipfftw_plan_creation_func::ANY);
2685+ }
2686+ // create a token consistent with other tests to enable kernel precompilation
2687+ // for valid cases, and/or capturing all required details about members otherwise
2688+ std::string token() const
2689+ {
2690+ std::ostringstream ret;
2691+ switch(dft_kind)
2692+ {
2693+ case fft_transform_type_complex_forward:
2694+ ret << "complex_forward";
2695+ break;
2696+ case fft_transform_type_complex_inverse:
2697+ ret << "complex_inverse";
2698+ break;
2699+ case fft_transform_type_real_forward:
2700+ ret << "real_forward";
2701+ break;
2702+ case fft_transform_type_real_inverse:
2703+ ret << "real_inverse";
2704+ break;
2705+ default:
2706+ throw std::runtime_error("unknown type of transform");
2707+ }
2708+
2709+ // report rank if invalid
2710+ if(!has_valid_rank() || lengths.empty())
2711+ ret << "_invalid_rank" << (rank < 0 ? "_negative_" : "_") << std::abs(rank);
2712+ ret << "_len";
2713+ if(lengths.empty())
2714+ ret << "_none";
2715+ else
2716+ {
2717+ for(const auto& len : lengths)
2718+ ret << (len < 0 ? "_negative_" : "_") << std::abs(len);
2719+ }
2720+ if constexpr(prec == fft_precision_single)
2721+ ret << "_single";
2722+ else
2723+ ret << "_double";
2724+ ret << (plan_placement == fft_placement_inplace ? "_ip" : "_op");
2725+ // only supporting unbatched cases as of now
2726+ ret << "_batch_1";
2727+ if(has_valid_rank() && has_valid_lengths())
2728+ {
2729+ ret << "_istride";
2730+ for(const auto& stride : get_strides_as<size_t>(fft_io::fft_io_in))
2731+ ret << "_" << stride;
2732+ if(!is_real(dft_kind))
2733+ ret << "_CI";
2734+ else if(dft_kind == fft_transform_type_real_forward)
2735+ ret << "_R";
2736+ else
2737+ ret << "_HI";
2738+ ret << "_ostride";
2739+ for(const auto& stride : get_strides_as<size_t>(fft_io::fft_io_out))
2740+ ret << "_" << stride;
2741+ if(!is_real(dft_kind))
2742+ ret << "_CI";
2743+ else if(dft_kind == fft_transform_type_real_forward)
2744+ ret << "_HI";
2745+ else
2746+ ret << "_R";
2747+ ret << "_idist_" << get_dist_as<size_t>(fft_io::fft_io_in);
2748+ ret << "_odist_" << get_dist_as<size_t>(fft_io::fft_io_out);
2749+ ret << "_ioffset_0_0_ooffset_0_0";
2750+ }
2751+ if(!has_valid_sign())
2752+ ret << "_invalid_sign" << (sign < 0 ? "_negative_" : "_") << std::abs(sign);
2753+ ret << "_flags_" << flags;
2754+ return ret.str();
2755+ }
2756+ // create_plan invokes an hipfftw plan creation function for the object's configuration
2757+ // parameters, the corresponding plan pointer returned by hipfftw is stored internally.
2758+ // IMPORTANT NOTE: if one wants to target a specific creation function (as represented
2759+ // by any value in hipfftw_plan_creation_func_candidates), setting the creation_options
2760+ // argument to that specific value effectively bypasses the verification that the
2761+ // object's configuration is actually (expected to be) supported and attempts the plan
2762+ // creation anyways (unless it simply cannot be done, e.g., attempting
2763+ // creation_options = hipfftw_plan_creation_func::PLAN_DFT_ND herein on an object
2764+ // holding a value for rank > 3 simply cannot be done)
2765+ void create_plan(void* in,
2766+ void* out,
2767+ hipfftw_plan_creation_func creation_options
2768+ = hipfftw_plan_creation_func::ANY) const
2769+ {
2770+ const auto& hipfftw_impl = hipfftw_funcs<prec>::get_instance();
2771+ const hipfftw_plan_creation_func chosen_option = get_creation_func(creation_options);
2772+ if(chosen_option == hipfftw_plan_creation_func::NONE)
2773+ {
2774+ plan_bundle = std::make_shared<hipfftw_plan_bundle_t<prec>>(hipfftw_impl.destroy_plan);
2775+ plan_bundle->creation_io = {in, out};
2776+ plan_bundle->plan = nullptr;
2777+ plan_bundle->creation_func = chosen_option;
2778+ plan_bundle->plan_token = "";
2779+ return;
2780+ }
2781+ // early return if there is no need to (re)build
2782+ if(plan_bundle && plan_bundle->plan_token == token() && plan_bundle->creation_io.first == in
2783+ && plan_bundle->creation_io.second == out && plan_bundle->creation_func == chosen_option)
2784+ return;
2785+
2786+ // create the desired plan
2787+ plan_bundle = std::make_shared<hipfftw_plan_bundle_t<prec>>(hipfftw_impl.destroy_plan);
2788+ plan_bundle->plan = make_plan(in, out, chosen_option);
2789+ plan_bundle->creation_io = {in, out};
2790+ plan_bundle->creation_func = chosen_option;
2791+ plan_bundle->plan_token = token();
2792+ }
2793+
2794+ // returns a reference FFTW plan for the current configuration
2795+ // The returned plan is NOT owned by this object!
2796+ hipfftw_plan_t<prec> get_reference_plan(void* in,
2797+ void* out,
2798+ hipfftw_plan_creation_func creation_options
2799+ = hipfftw_plan_creation_func::ANY) const
2800+ {
2801+ const hipfftw_plan_creation_func chosen_option = get_creation_func(creation_options);
2802+ if(chosen_option == hipfftw_plan_creation_func::NONE)
2803+ {
2804+ return nullptr;
2805+ }
2806+ constexpr bool make_reference_plan = true;
2807+ return make_plan<make_reference_plan>(in, out, chosen_option);
2808+ }
2809+
2810+ void execute(void* execute_in, void* execute_out) const
2811+ {
2812+ if(!plan_bundle || plan_bundle->plan_token != token())
2813+ {
2814+ // plan is not created or possibly not up-to-date
2815+ create_plan(execute_in, execute_out);
2816+ }
2817+
2818+ const auto& hipfftw_impl = hipfftw_funcs<prec>::get_instance();
2819+ if(execute_in == plan_bundle->creation_io.first
2820+ && execute_out == plan_bundle->creation_io.second)
2821+ {
2822+ hipfftw_impl.execute(plan_bundle->plan);
2823+ }
2824+ else
2825+ {
2826+ throw std::runtime_error("New-array execution functions not implemented yet.");
2827+ }
2828+ }
2829+
2830+ // TODO: revise/expand logic below when the structure is expanded for more cases (batches,
2831+ // non-default strides, etc.)
2832+ size_t get_num_elements_in(fft_io in_or_out) const
2833+ {
2834+ if(in_or_out != fft_io_in && in_or_out != fft_io_out)
2835+ throw std::invalid_argument("invalid in_or_out for get_num_elements_in");
2836+ if(!has_valid_rank() || !has_valid_lengths())
2837+ throw std::runtime_error("get_num_elements_in requires valid rank and lengths");
2838+ const auto tmp = get_length_as<size_t>();
2839+ if(tmp.empty() || tmp.size() != rank)
2840+ {
2841+ throw std::runtime_error(
2842+ "get_num_elements_in failed to correctly convert lengths to size_t values");
2843+ }
2844+ size_t num_elems = 1;
2845+ if(is_complex(dft_kind))
2846+ {
2847+ num_elems *= tmp[rank - 1];
2848+ }
2849+ else
2850+ {
2851+ const size_t cmplx_len = tmp[rank - 1] / 2 + 1;
2852+ if(is_fwd(dft_kind) == (in_or_out == fft_io_out))
2853+ num_elems *= cmplx_len;
2854+ else
2855+ num_elems
2856+ *= plan_placement == fft_placement_inplace ? 2 * cmplx_len : tmp[rank - 1];
2857+ }
2858+ num_elems *= product(tmp.begin(), tmp.begin() + rank - 1);
2859+ return num_elems;
2860+ }
2861+
2862+ size_t get_data_byte_size(fft_io in_or_out) const
2863+ {
2864+ if(in_or_out != fft_io_in && in_or_out != fft_io_out)
2865+ throw std::invalid_argument("invalid in_or_out for get_data_byte_size");
2866+ // for in-place, input and output data sizes are enforced equal
2867+ std::vector<fft_io> io_range_to_consider = {in_or_out};
2868+ if(plan_placement == fft_placement_inplace)
2869+ io_range_to_consider.push_back(in_or_out == fft_io::fft_io_in ? fft_io::fft_io_out
2870+ : fft_io::fft_io_in);
2871+
2872+ size_t ret = 0;
2873+ for(auto io : io_range_to_consider)
2874+ {
2875+ const size_t num_elems = get_num_elements_in(io);
2876+ if(is_complex(dft_kind) || (is_fwd(dft_kind) == (io == fft_io_out)))
2877+ ret = std::max(ret, num_elems * sizeof(hipfftw_complex_t<prec>));
2878+ else
2879+ ret = std::max(ret, num_elems * sizeof(hipfftw_real_t<prec>));
2880+ }
2881+ return ret;
2882+ }
2883+ void release_plan() const
2884+ {
2885+ plan_bundle.reset();
2886+ }
2887+};
2888+
2889+#endif
2890diff --git a/clients/samples/CMakeLists.txt b/clients/samples/CMakeLists.txt
2891index 488e09d..37cc924 100644
2892--- a/clients/samples/CMakeLists.txt
2893+++ b/clients/samples/CMakeLists.txt
2894@@ -124,8 +124,15 @@ foreach( sample ${sample_list} )
2895
2896 endforeach()
2897
2898-# cuFFT callback code must be compiled with -dc to enable relocatable
2899-# device code
2900-if( BUILD_WITH_LIB STREQUAL "CUDA" AND hipfft_callback IN_LIST sample_list )
2901- target_compile_options( hipfft_callback PRIVATE -dc )
2902+# callback code must be compiled as relocatable device code
2903+if( hipfft_callback IN_LIST sample_list )
2904+ if( BUILD_WITH_LIB STREQUAL "CUDA" )
2905+ target_compile_options( hipfft_callback PRIVATE -dc )
2906+ else()
2907+ # -fgpu-rdc causes failure at link stage on Windows
2908+ if (NOT WIN32)
2909+ target_compile_options( hipfft_callback PRIVATE -fgpu-rdc )
2910+ target_link_options( hipfft_callback PRIVATE -fgpu-rdc )
2911+ endif()
2912+ endif()
2913 endif()
2914diff --git a/clients/samples/hipfft_callback.cpp b/clients/samples/hipfft_callback.cpp
2915index b365e87..8a77266 100644
2916--- a/clients/samples/hipfft_callback.cpp
2917+++ b/clients/samples/hipfft_callback.cpp
2918@@ -20,6 +20,7 @@
2919 // THE SOFTWARE.
2920
2921 #include <iostream>
2922+#ifndef WIN32
2923 #include <vector>
2924
2925 #include <hip/hip_runtime.h>
2926@@ -48,9 +49,15 @@ __device__ hipfftDoubleComplex load_callback(hipfftDoubleComplex* input,
2927 }
2928
2929 __device__ auto load_callback_dev = load_callback;
2930+#endif
2931
2932 int main()
2933 {
2934+#ifdef WIN32
2935+ std::cout << "This sample is temporarily disabled on Windows" << std::endl;
2936+ return EXIT_SUCCESS;
2937+#else
2938+
2939 std::cout << "hipfft 1D double-precision complex-to-complex transform with callback\n";
2940
2941 const int Nx = 8;
2942@@ -156,4 +163,5 @@ int main()
2943 throw std::runtime_error("hipFree failed");
2944
2945 return 0;
2946+#endif
2947 }
2948diff --git a/clients/tests/CMakeLists.txt b/clients/tests/CMakeLists.txt
2949index 28c6594..91024cf 100644
2950--- a/clients/tests/CMakeLists.txt
2951+++ b/clients/tests/CMakeLists.txt
2952@@ -32,6 +32,13 @@ else( )
2953 "Install path prefix, prepended onto install directories" )
2954 endif( )
2955
2956+# Dependencies
2957+
2958+find_package( ROCmCMakeBuildTools REQUIRED CONFIG PATHS /opt/rocm )
2959+include(ROCMInstallTargets)
2960+list( APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/../cmake )
2961+
2962+
2963 # This has to be initialized before the project() command appears
2964 # Set the default of CMAKE_BUILD_TYPE to be release, unless user
2965 # specifies with -D. MSVC_IDE does not use CMAKE_BUILD_TYPE
2966@@ -42,13 +49,19 @@ endif()
2967
2968 project( hipfft-clients-tests LANGUAGES CXX )
2969
2970+if( NOT HIPFFT_BUILD_SCOPE )
2971+ find_package( hipfft REQUIRED CONFIG PATHS )
2972+endif()
2973
2974 find_package( Boost REQUIRED)
2975
2976 set( Boost_USE_STATIC_LIBS OFF )
2977
2978+
2979 find_package( FFTW 3.0 REQUIRED MODULE COMPONENTS FLOAT DOUBLE )
2980
2981+set( BUILD_WITH_LIB "ROCM" CACHE STRING "Build ${PROJECT_NAME} with ROCM or CUDA libraries" )
2982+
2983 set( THREADS_PREFER_PTHREAD_FLAG ON )
2984 find_package( Threads REQUIRED )
2985
2986@@ -60,7 +73,9 @@ set( hipfft-test_source
2987 accuracy_test_2D.cpp
2988 accuracy_test_3D.cpp
2989 accuracy_test_callback.cpp
2990+ hipfftw_test.cpp
2991 multi_device_test.cpp
2992+ multi_stream_test.cpp
2993 ../../shared/array_validator.cpp
2994 )
2995
2996@@ -84,7 +99,7 @@ if( NOT BUILD_WITH_LIB STREQUAL "CUDA" )
2997 if( WIN32 )
2998 find_package( HIP CONFIG REQUIRED )
2999 else()
3000- find_package( HIP MODULE REQUIRED )
3001+ find_package( hip REQUIRED CONFIG PATHS /opt/rocm/lib/cmake/hip/ )
3002 endif()
3003 endif()
3004
3005@@ -98,6 +113,20 @@ endif()
3006 string( CONCAT TESTS_OUT_DIR "${PROJECT_BINARY_DIR}" ${TESTS_OUT_DIR} )
3007
3008 option( BUILD_CLIENTS_TESTS_OPENMP "Build tests with OpenMP" ON )
3009+if( BUILD_CLIENTS_TESTS_OPENMP AND NOT BUILD_WITH_LIB STREQUAL "CUDA" )
3010+ # Attempt to find a config version, which provides openmp_LIB_DIR.
3011+ find_package( OpenMP CONFIG PATHS "${HIP_CLANG_ROOT}/lib/cmake" )
3012+ if( NOT OPENMP_FOUND OR NOT DEFINED ${openmp_LIB_DIR} )
3013+ # Fall-back to module mode.
3014+ find_package( OpenMP REQUIRED )
3015+ set( BUILD_RPATH "${HIP_CLANG_ROOT}/lib" )
3016+ set( INSTALL_RPATH "$ORIGIN/../llvm/lib" )
3017+ else()
3018+ set( BUILD_RPATH "${HIP_CLANG_ROOT}/${openmp_LIB_DIR}" )
3019+ set( INSTALL_RPATH "$ORIGIN/../llvm/${openmp_LIB_DIR}" )
3020+
3021+ endif()
3022+endif()
3023
3024 foreach( target ${TEST_TARGETS} )
3025 set_target_properties( ${target} PROPERTIES
3026@@ -105,6 +134,15 @@ foreach( target ${TEST_TARGETS} )
3027 CXX_STANDARD_REQUIRED ON
3028 )
3029
3030+ if( BUILD_CLIENTS_TESTS_OPENMP )
3031+ set_target_properties( ${TEST_TARGETS} PROPERTIES
3032+ BUILD_RPATH "${BUILD_RPATH}"
3033+ )
3034+ set_target_properties( ${TEST_TARGETS} PROPERTIES
3035+ INSTALL_RPATH "${INSTALL_RPATH}"
3036+ )
3037+ endif()
3038+
3039 if( BUILD_WITH_LIB STREQUAL "ROCM" )
3040 target_compile_options( ${target} PRIVATE ${WARNING_FLAGS} )
3041 target_link_libraries( ${target}
3042@@ -112,7 +150,7 @@ foreach( target ${TEST_TARGETS} )
3043 hip::host
3044 hip::device
3045 )
3046- foreach( gpu_target ${AMDGPU_TARGETS} )
3047+ foreach( gpu_target ${GPU_TARGETS} )
3048 target_compile_options( ${target} PRIVATE --offload-arch=${gpu_target} )
3049 endforeach()
3050
3051@@ -140,30 +178,12 @@ foreach( target ${TEST_TARGETS} )
3052 target_compile_definitions( ${target} PUBLIC _CUFFT_BACKEND )
3053 endif()
3054
3055- if( BUILD_CLIENTS_TESTS_OPENMP )
3056- if( BUILD_WITH_LIB STREQUAL "CUDA" )
3057- message( STATUS "OpenMP is not supported on CUDA, building tests without it" )
3058- else()
3059- target_compile_options( ${target} PRIVATE -DBUILD_CLIENTS_TESTS_OPENMP )
3060- if(NOT (CMAKE_CXX_COMPILER MATCHES ".*hipcc$" OR CMAKE_CXX_COMPILER MATCHES ".*clang\\+\\+"))
3061- target_compile_options( ${target} PRIVATE -fopenmp )
3062- target_link_libraries( ${target} PRIVATE -fopenmp -L${HIP_CLANG_ROOT}/lib -Wl,-rpath=${HIP_CLANG_ROOT}/lib )
3063- target_include_directories( ${target} PRIVATE ${HIP_CLANG_ROOT}/include )
3064- else()
3065- if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
3066- target_compile_options( ${target} PRIVATE -fopenmp=libomp )
3067- target_link_options( ${target} PRIVATE -fopenmp=libomp )
3068- endif()
3069- endif()
3070- endif()
3071- endif()
3072
3073 target_include_directories( ${target}
3074 PRIVATE
3075 $<BUILD_INTERFACE:${Boost_INCLUDE_DIRS}>
3076 $<BUILD_INTERFACE:${FFTW_INCLUDE_DIRS}>
3077 $<BUILD_INTERFACE:${hip_INCLUDE_DIRS}>
3078- $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/../../library/include>
3079 )
3080
3081 target_link_libraries( ${target}
3082@@ -171,6 +191,18 @@ foreach( target ${TEST_TARGETS} )
3083 hip::hipfft
3084 ${FFTW_LIBRARIES}
3085 )
3086+
3087+ if( BUILD_CLIENTS_TESTS_OPENMP )
3088+ if( BUILD_WITH_LIB STREQUAL "CUDA" )
3089+ message( STATUS "OpenMP is not supported on CUDA, building tests without it" )
3090+ else()
3091+ if( DEFINED ${openmp_LIB_DIR} )
3092+ set_target_properties( ${target} PROPERTIES BUILD_RPATH "${HIP_CLANG_ROOT}/${openmp_LIB_DIR}" )
3093+ set_target_properties( ${target} PROPERTIES INSTALL_RPATH "${HIP_CLANG_ROOT}/${openmp_LIB_DIR}" )
3094+ endif()
3095+ target_link_libraries( ${target} PRIVATE OpenMP::OpenMP_CXX )
3096+ endif()
3097+ endif()
3098
3099 if( HIPFFT_MPI_ENABLE )
3100 target_link_libraries( ${target}
3101@@ -187,15 +219,29 @@ foreach( target ${TEST_TARGETS} )
3102 rocm_install(TARGETS ${target} COMPONENT tests)
3103 endforeach()
3104
3105-if( GTEST_FOUND )
3106+find_package( GTest 1.11.0 )
3107+
3108+if( GTest_FOUND )
3109 target_include_directories( hipfft-test PRIVATE $<BUILD_INTERFACE:${GTEST_INCLUDE_DIRS}> )
3110- target_link_libraries( hipfft-test PRIVATE ${GTEST_LIBRARIES} )
3111+ target_link_libraries( hipfft-test PRIVATE GTest::gtest )
3112 else()
3113+ # gtest build by the hipFFT
3114 add_dependencies( hipfft-test gtest )
3115 target_include_directories( hipfft-test PRIVATE hipfft-test_include_dirs ${GTEST_INCLUDE_DIRS} )
3116 target_link_libraries( hipfft-test PRIVATE ${GTEST_LIBRARIES} )
3117 endif()
3118
3119+# tests have callback functions, which need to be built as relocatable device code
3120+if( BUILD_WITH_LIB STREQUAL "CUDA" )
3121+ target_compile_options( hipfft-test PRIVATE -dc )
3122+else()
3123+ # -fgpu-rdc causes failure at link stage on Windows
3124+ if (NOT WIN32)
3125+ target_compile_options( hipfft-test PRIVATE -fgpu-rdc )
3126+ target_link_options( hipfft-test PRIVATE -fgpu-rdc )
3127+ endif()
3128+endif()
3129+
3130 if(FFTW_MULTITHREAD)
3131 target_compile_options( hipfft-test PRIVATE -DFFTW_MULTITHREAD )
3132 endif( )
3133@@ -203,8 +249,14 @@ endif( )
3134 target_link_libraries( hipfft-test
3135 PRIVATE
3136 Threads::Threads
3137+ ${CMAKE_DL_LIBS}
3138 )
3139
3140+# hipfft-test will opens the hipfftw library but does not link to it
3141+if( TARGET hipfftw )
3142+ add_dependencies( hipfft-test hipfftw )
3143+endif()
3144+
3145 if (WIN32)
3146
3147 # Ensure tests run with HIP DLLs and not anything the driver owns
3148@@ -222,3 +274,42 @@ if (WIN32)
3149 add_custom_command( TARGET hipfft-test POST_BUILD COMMAND ${CMAKE_COMMAND} ARGS -E copy ${file_i} $<TARGET_FILE_DIR:hipfft-test> )
3150 endforeach( file_i )
3151 endif()
3152+
3153+option(BUILD_CODE_COVERAGE "Build with code coverage flags (clang only)" OFF)
3154+set(COVERAGE_TEST_OPTIONS "--smoketest;--gtest_filter=-*call*" CACHE STRING "Command line arguments for hipfft-test when generating a code coverage report (Note: an additional run of hipfft-test targeting multi_gpu* and callback* tests is always executed and coverage results are aggregated)")
3155+if (BUILD_CODE_COVERAGE)
3156+ add_custom_target(
3157+ code_cov_tests
3158+ DEPENDS hipfft-test
3159+ COMMAND ${CMAKE_COMMAND} -E rm -rf ./coverage-report
3160+ COMMAND ${CMAKE_COMMAND} -E make_directory ./coverage-report/profraw
3161+ COMMAND ${CMAKE_COMMAND} -E env LLVM_PROFILE_FILE="./coverage-report/profraw/hipfft-coverage_%p.profraw" GTEST_LISTENER=NO_PASS_LINE_IN_LOG $<TARGET_FILE:hipfft-test> --precompile=./clients/staging/hipfft-test-precompile.db ${COVERAGE_TEST_OPTIONS}
3162+ COMMAND ${CMAKE_COMMAND} -E env LLVM_PROFILE_FILE="./coverage-report/profraw/hipfft-coverage_%p.profraw" GTEST_LISTENER=NO_PASS_LINE_IN_LOG $<TARGET_FILE:hipfft-test> --precompile=./clients/staging/hipfft-test-precompile-multi_gpu-plus-callback.db --gtest_filter=multi_gpu*:callback*
3163+ WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
3164+ )
3165+
3166+ find_program(
3167+ LLVM_PROFDATA
3168+ llvm-profdata
3169+ REQUIRED
3170+ HINTS ${ROCM_PATH}/llvm/bin
3171+ PATHS /opt/rocm/llvm/bin
3172+ )
3173+
3174+ find_program(
3175+ LLVM_COV
3176+ llvm-cov
3177+ REQUIRED
3178+ HINTS ${ROCM_PATH}/llvm/bin
3179+ PATHS /opt/rocm/llvm/bin
3180+ )
3181+
3182+ add_custom_target(
3183+ coverage
3184+ DEPENDS code_cov_tests
3185+ COMMAND ${LLVM_PROFDATA} merge -sparse ./coverage-report/profraw/hipfft-coverage_*.profraw -o ./coverage-report/hipfft.profdata
3186+ COMMAND ${LLVM_COV} report -object ./library/libhipfftw.so -object ./library/libhipfft.so -instr-profile=./coverage-report/hipfft.profdata
3187+ COMMAND ${LLVM_COV} show -object ./library/libhipfftw.so -object ./library/libhipfft.so -instr-profile=./coverage-report/hipfft.profdata -format=html -output-dir=coverage-report
3188+ WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
3189+ )
3190+endif()
3191diff --git a/clients/tests/accuracy_test_1D.cpp b/clients/tests/accuracy_test_1D.cpp
3192index 91e238e..b2aa382 100644
3193--- a/clients/tests/accuracy_test_1D.cpp
3194+++ b/clients/tests/accuracy_test_1D.cpp
3195@@ -81,14 +81,8 @@ static std::vector<size_t> small_1D_sizes()
3196 static const size_t SMALL_1D_MAX = 8192;
3197
3198 // generate a list of sizes from 2 and up, skipping any sizes that are already covered
3199- std::vector<size_t> covered_sizes;
3200- std::copy(pow2_range.begin(), pow2_range.end(), std::back_inserter(covered_sizes));
3201- std::copy(pow3_range.begin(), pow3_range.end(), std::back_inserter(covered_sizes));
3202- std::copy(pow5_range.begin(), pow5_range.end(), std::back_inserter(covered_sizes));
3203- std::copy(radX_range.begin(), radX_range.end(), std::back_inserter(covered_sizes));
3204- std::copy(mix_range.begin(), mix_range.end(), std::back_inserter(covered_sizes));
3205- std::copy(prime_range.begin(), prime_range.end(), std::back_inserter(covered_sizes));
3206- std::sort(covered_sizes.begin(), covered_sizes.end());
3207+ std::vector<size_t> covered_sizes = merge_and_sort_values<size_t>(
3208+ {pow2_range, pow3_range, pow5_range, radX_range, mix_range, prime_range});
3209
3210 std::vector<size_t> output;
3211 for(size_t i = 2; i < SMALL_1D_MAX; ++i)
3212@@ -319,29 +313,24 @@ INSTANTIATE_TEST_SUITE_P(DISABLED_offset_mix_1D,
3213
3214 // small 1D sizes just need to make sure our factorization isn't
3215 // completely broken, so we just check simple C2C outplace interleaved
3216-INSTANTIATE_TEST_SUITE_P(small_1D,
3217- accuracy_test,
3218- ::testing::ValuesIn(param_generator_base(
3219- test_prob,
3220- {fft_transform_type_complex_forward},
3221- generate_lengths({small_1D_sizes()}),
3222- {fft_precision_single},
3223- {1},
3224- [](fft_transform_type t,
3225- const std::vector<fft_result_placement>& place_range,
3226- const bool planar) {
3227- return std::vector<type_place_io_t>{
3228- std::make_tuple(t,
3229- place_range[0],
3230- fft_array_type_complex_interleaved,
3231- fft_array_type_complex_interleaved)};
3232- },
3233- stride_range,
3234- stride_range,
3235- ioffset_range_zero,
3236- ooffset_range_zero,
3237- {fft_placement_notinplace})),
3238- accuracy_test::TestName);
3239+const static std::vector<size_t> small_1D_lengths = small_1D_sizes();
3240+
3241+INSTANTIATE_TEST_SUITE_P(
3242+ small_1D,
3243+ accuracy_test,
3244+ ::testing::ValuesIn(param_generator_base(test_prob,
3245+ {fft_transform_type_complex_forward},
3246+ generate_lengths({small_1D_lengths}),
3247+ {fft_precision_single},
3248+ {1},
3249+ generate_types,
3250+ stride_range,
3251+ stride_range,
3252+ ioffset_range_zero,
3253+ ooffset_range_zero,
3254+ {fft_placement_notinplace},
3255+ false /* planar */)),
3256+ accuracy_test::TestName);
3257
3258 // NB:
3259 // We have known non-unit strides issues for 1D:
3260@@ -352,7 +341,7 @@ INSTANTIATE_TEST_SUITE_P(small_1D,
3261 // main tests.
3262 //
3263 // The below test covers non-unit strides, pow of 2, middle sizes, which has SBCC/SBRC kernels
3264-// invloved.
3265+// involved.
3266 const static std::vector<size_t> pow2_range_for_stride = {4096, 8192, 524288};
3267 const static std::vector<size_t> pow2_range_for_stride_half = {4096, 8192};
3268 const static std::vector<std::vector<size_t>> stride_range_for_pow2 = {{2}, {3}};
3269@@ -364,7 +353,7 @@ INSTANTIATE_TEST_SUITE_P(
3270 ::testing::ValuesIn(param_generator_complex(test_prob,
3271 generate_lengths({pow2_range_for_stride}),
3272 precision_range_sp_dp,
3273- batch_range_1D,
3274+ batch_range_for_stride,
3275 stride_range_for_pow2,
3276 stride_range_for_pow2,
3277 ioffset_range_zero,
3278@@ -380,7 +369,7 @@ INSTANTIATE_TEST_SUITE_P(
3279 ::testing::ValuesIn(param_generator_real(test_prob,
3280 generate_lengths({pow2_range_for_stride}),
3281 precision_range_sp_dp,
3282- batch_range_1D,
3283+ batch_range_for_stride,
3284 stride_range_for_pow2,
3285 stride_range_for_pow2,
3286 ioffset_range_zero,
3287@@ -396,7 +385,7 @@ INSTANTIATE_TEST_SUITE_P(
3288 ::testing::ValuesIn(param_generator_real(test_prob,
3289 generate_lengths({pow2_range_for_stride_half}),
3290 {fft_precision_half},
3291- batch_range_1D,
3292+ batch_range_for_stride,
3293 stride_range_for_pow2,
3294 stride_range_for_pow2,
3295 ioffset_range_zero,
3296@@ -537,3 +526,24 @@ INSTANTIATE_TEST_SUITE_P(
3297 ooffset_range_zero,
3298 place_range)),
3299 accuracy_test::TestName);
3300+
3301+const static std::vector<size_t> lengths_for_disabled_autoalloc = merge_and_sort_values<size_t>(
3302+ {pow2_range, pow3_range, pow5_range, radX_range, mix_range, small_1D_lengths, prime_range},
3303+ 128);
3304+
3305+INSTANTIATE_TEST_SUITE_P(
3306+ various_1D,
3307+ accuracy_test,
3308+ ::testing::ValuesIn(param_generator(test_prob,
3309+ generate_lengths({lengths_for_disabled_autoalloc}),
3310+ precision_range_sp_dp,
3311+ batch_range_1D,
3312+ stride_range,
3313+ stride_range,
3314+ ioffset_range_zero,
3315+ ooffset_range_zero,
3316+ place_range,
3317+ false,
3318+ false,
3319+ fft_auto_allocation_off)),
3320+ accuracy_test::TestName);
3321diff --git a/clients/tests/accuracy_test_2D.cpp b/clients/tests/accuracy_test_2D.cpp
3322index 5f1b8a9..6f45abe 100644
3323--- a/clients/tests/accuracy_test_2D.cpp
3324+++ b/clients/tests/accuracy_test_2D.cpp
3325@@ -278,3 +278,24 @@ INSTANTIATE_TEST_SUITE_P(len1_swap_2D,
3326 false,
3327 false)),
3328 accuracy_test::TestName);
3329+
3330+const static std::vector<size_t> lengths_for_disabled_autoalloc
3331+ = merge_and_sort_values<size_t>({pow2_range, pow3_range, prime_range, mix_range}, 12);
3332+
3333+INSTANTIATE_TEST_SUITE_P(
3334+ various_2D,
3335+ accuracy_test,
3336+ ::testing::ValuesIn(param_generator(test_prob,
3337+ generate_lengths({lengths_for_disabled_autoalloc,
3338+ lengths_for_disabled_autoalloc}),
3339+ precision_range_sp_dp,
3340+ batch_range,
3341+ stride_range,
3342+ stride_range,
3343+ ioffset_range_zero,
3344+ ooffset_range_zero,
3345+ place_range,
3346+ false,
3347+ false,
3348+ fft_auto_allocation_off)),
3349+ accuracy_test::TestName);
3350diff --git a/clients/tests/accuracy_test_3D.cpp b/clients/tests/accuracy_test_3D.cpp
3351index f338695..f3f780d 100644
3352--- a/clients/tests/accuracy_test_3D.cpp
3353+++ b/clients/tests/accuracy_test_3D.cpp
3354@@ -284,3 +284,25 @@ INSTANTIATE_TEST_SUITE_P(
3355 false,
3356 false)),
3357 accuracy_test::TestName);
3358+
3359+const static std::vector<size_t> lengths_for_disabled_autoalloc = merge_and_sort_values<size_t>(
3360+ {pow2_range, pow3_range, pow5_range, prime_range, sbrc_range}, 5);
3361+
3362+INSTANTIATE_TEST_SUITE_P(
3363+ various_3D,
3364+ accuracy_test,
3365+ ::testing::ValuesIn(param_generator(test_prob,
3366+ generate_lengths({lengths_for_disabled_autoalloc,
3367+ lengths_for_disabled_autoalloc,
3368+ lengths_for_disabled_autoalloc}),
3369+ precision_range_sp_dp,
3370+ batch_range,
3371+ stride_range,
3372+ stride_range,
3373+ ioffset_range_zero,
3374+ ooffset_range_zero,
3375+ place_range,
3376+ false,
3377+ false,
3378+ fft_auto_allocation_off)),
3379+ accuracy_test::TestName);
3380diff --git a/clients/tests/accuracy_test_callback.cpp b/clients/tests/accuracy_test_callback.cpp
3381index 00e6026..0bae646 100644
3382--- a/clients/tests/accuracy_test_callback.cpp
3383+++ b/clients/tests/accuracy_test_callback.cpp
3384@@ -104,22 +104,27 @@ const static std::vector<std::vector<size_t>> ooffset_range = {{0, 0}, {1, 1}};
3385 auto transform_types = {fft_transform_type_complex_forward, fft_transform_type_real_forward};
3386
3387 #ifdef __HIP__
3388-INSTANTIATE_TEST_SUITE_P(callback,
3389- accuracy_test,
3390- ::testing::ValuesIn(param_generator_base(test_prob,
3391- transform_types,
3392- callback_sizes,
3393- precision_range_sp_dp,
3394- batch_range,
3395- generate_types,
3396- stride_range,
3397- stride_range,
3398- ioffset_range_zero,
3399- ooffset_range_zero,
3400- place_range,
3401- false,
3402- true)),
3403- accuracy_test::TestName);
3404+INSTANTIATE_TEST_SUITE_P(
3405+#ifdef WIN32
3406+ DISABLED_callback_no_offset,
3407+#else
3408+ callback,
3409+#endif
3410+ accuracy_test,
3411+ ::testing::ValuesIn(param_generator_base(test_prob,
3412+ transform_types,
3413+ callback_sizes,
3414+ precision_range_sp_dp,
3415+ batch_range,
3416+ generate_types,
3417+ stride_range,
3418+ stride_range,
3419+ ioffset_range_zero,
3420+ ooffset_range_zero,
3421+ place_range,
3422+ false,
3423+ true)),
3424+ accuracy_test::TestName);
3425
3426 INSTANTIATE_TEST_SUITE_P(DISABLED_callback,
3427 accuracy_test,
3428diff --git a/clients/tests/gtest_main.cpp b/clients/tests/gtest_main.cpp
3429index b574106..ee2f5ec 100644
3430--- a/clients/tests/gtest_main.cpp
3431+++ b/clients/tests/gtest_main.cpp
3432@@ -41,20 +41,18 @@
3433 #include "../hipfft_params.h"
3434 #include "hipfft/hipfft.h"
3435 #include "hipfft_accuracy_test.h"
3436-#include "hipfft_test_params.h"
3437+
3438+// initialize static class member of hipfft_params
3439+std::vector<gpubuf> hipfft_params::externally_managed_workareas = std::vector<gpubuf>();
3440
3441 // Control output verbosity:
3442 int verbose;
3443
3444-// Run a short (~5 min) test suite by setting test_prob to an appropriate value
3445-bool smoketest = false;
3446-
3447 // User-defined random seed
3448-size_t random_seed;
3449+size_t random_seed;
3450+std::random_device default_seed_dev;
3451 // Overall probability of running conventional tests
3452 double test_prob;
3453-// Probability of running tests from the emulation suite
3454-double emulation_prob;
3455 // Modifier for probability of running tests with complex interleaved data
3456 double complex_interleaved_prob_factor;
3457 // Modifier for probability of running tests with real data
3458@@ -63,9 +61,12 @@ double real_prob_factor;
3459 double complex_planar_prob_factor;
3460 // Modifier for probability of running tests with callbacks
3461 double callback_prob_factor;
3462+// Constraints for the hipfftw tests
3463+size_t max_length_for_hipfftw_test;
3464+size_t max_io_gb_for_hipfftw_test;
3465
3466 // Transform parameters for manual test:
3467-fft_params manual_params;
3468+hipfft_params manual_params;
3469
3470 // Host memory limitation for tests (GiB):
3471 size_t ramgb;
3472@@ -104,7 +105,7 @@ last_cpu_fft_cache last_cpu_fft_data;
3473 fft_params::fft_mp_lib mp_lib = fft_params::fft_mp_lib_none;
3474 // Number of multi-process ranks to launch
3475 int mp_ranks = 1;
3476-// Multi-process launch command (e.g. mpirun --np 4 /path/to/rocfft_mpi_worker)
3477+// Multi-process launch command (e.g. mpirun --np 4 /path/to/hipfft_mpi_worker)
3478 std::string mp_launch;
3479
3480 void init_gtest_flags()
3481@@ -174,9 +175,10 @@ void precompile_test_kernels(const std::string& precompile_file)
3482 continue;
3483
3484 // only care about accuracy tests
3485- if(name.find("vs_fftw/") != std::string::npos)
3486+ const auto pos = name.find("vs_fftw/");
3487+ if(pos != std::string::npos)
3488 {
3489- name.erase(0, 8);
3490+ name.erase(0, pos + 8);
3491
3492 // change batch to 1, so we don't waste time creating
3493 // multiple plans that differ only by batch
3494@@ -199,7 +201,7 @@ void precompile_test_kernels(const std::string& precompile_file)
3495 std::mt19937 dist(dev());
3496 std::shuffle(tokens.begin(), tokens.end(), dist);
3497 auto precompile_begin = std::chrono::steady_clock::now();
3498- std::cout << "precompiling " << tokens.size() << " FFT plans...\n";
3499+ std::cout << "precompiling kernels for " << tokens.size() << " tokens...\n";
3500
3501 for(auto&& t : tokens)
3502 tokenQueue.push(std::move(t));
3503@@ -222,6 +224,13 @@ void precompile_test_kernels(const std::string& precompile_file)
3504 params.from_token(token);
3505 params.validate();
3506 params.create_plan();
3507+ if(params.is_forward())
3508+ {
3509+ hipfft_params inverse_params;
3510+ inverse_params.inverse_from_forward(params);
3511+ inverse_params.validate();
3512+ inverse_params.create_plan();
3513+ }
3514 }
3515 catch(fft_params::work_buffer_alloc_failure&)
3516 {
3517@@ -271,26 +280,45 @@ int main(int argc, char* argv[])
3518 " HP - hermitian planar\n"
3519 "\n"
3520 "Usage"};
3521-
3522- // Override CLI11 help to print after later CLI11 options that are defined, and allow gtest's help
3523+ // Override CLI11 help to print it along gtest's help
3524 app.set_help_flag("");
3525- CLI::Option* opt_help = app.add_flag("-h, --help", "Produces this help message");
3526+ const auto opt_help = app.add_flag("-h, --help", "Produces this help message");
3527 app.add_option("-v, --verbose", verbose, "Print out detailed information for the tests")
3528 ->default_val(0);
3529 app.add_option("--test_prob", test_prob, "Probability of running individual tests")
3530 ->default_val(1.0)
3531 ->check(CLI::Range(0.0, 1.0));
3532+ app.add_option("--real_prob",
3533+ real_prob_factor,
3534+ "Probability multiplier for running individual real/complex transforms")
3535+ ->default_val(1.0)
3536+ ->check(CLI::PositiveNumber);
3537+ app.add_option("--planar_prob",
3538+ complex_planar_prob_factor,
3539+ "Probability multiplier for running individual planar transforms")
3540+ ->default_val(0.1)
3541+ ->check(CLI::PositiveNumber);
3542 app.add_option(
3543 "--complex_interleaved_prob_factor",
3544 complex_interleaved_prob_factor,
3545 "Probability multiplier for running individual transforms with complex interleaved data")
3546 ->default_val(1)
3547- ->check(CLI::NonNegativeNumber);
3548+ ->check(CLI::PositiveNumber);
3549 app.add_option("--callback_prob",
3550 callback_prob_factor,
3551 "Probability multiplier for running individual callback transforms")
3552 ->default_val(0.1)
3553 ->check(CLI::NonNegativeNumber);
3554+ app.add_option("--max_hipfftw_test_len",
3555+ max_length_for_hipfftw_test,
3556+ "Maximum length to be considered in hipfftw tests")
3557+ ->default_val(8192)
3558+ ->check(CLI::PositiveNumber);
3559+ app.add_option("--max_io_gb_for_hipfftw_test",
3560+ max_io_gb_for_hipfftw_test,
3561+ "Maximum size of I/O to be considered in hipfftw tests in GiB")
3562+ ->default_val(1) /* 1 GiB */
3563+ ->check(CLI::PositiveNumber);
3564
3565 app.add_option("--fftw_compare", fftw_compare, "Compare to FFTW in accuracy tests")
3566 ->default_val(true);
3567@@ -302,71 +330,43 @@ int main(int argc, char* argv[])
3568 app.add_option("--mp_launch",
3569 mp_launch,
3570 "Command line prefix to launch multi-process transforms, e.g. \"mpirun --np 4 "
3571- "/path/to/rocfft_mpi_worker\"")
3572+ "/path/to/hipfft_mpi_worker\"")
3573 ->default_val("")
3574 ->each([&](const std::string&) {
3575 if(mp_lib == fft_params::fft_mp_lib_none)
3576 {
3577- std::cout << "--mp_launch requires an mp library (see mp_lib in --help).\n";
3578- std::exit(-1);
3579+ throw CLI::ValidationError(
3580+ "--mp_launch requires an mp library (see mp_lib in --help)");
3581 }
3582 })
3583 ->needs("--mp_lib");
3584- // FIXME: Seed has no use currently
3585- // CLI::Option* opt_seed =
3586- app.add_option("--seed", random_seed, "Random seed; if unset, use an actual random seed");
3587+ app.add_option("--seed", random_seed, "Random seed; if unset, use an actual random seed")
3588+ ->default_val(default_seed_dev());
3589 app.add_flag("--smoketest", "Run a short (approx 5 minute) randomized selection of tests")
3590 ->each([&](const std::string&) {
3591- // The objective is to have an test that takes about 5 minutes, so just set the probability
3592- // per test to a small value to achieve this result.
3593+ // The objective is to have an test that takes about 5 minutes, so just set the
3594+ // probability per test to a small value to achieve this result.
3595 test_prob = 0.002;
3596 });
3597-
3598- // Try parsing initial args that will be used to configure tests
3599- // Allow extras to pass on gtest and hipFFT arguments without error
3600- app.allow_extras();
3601- try
3602- {
3603- app.parse(argc, argv);
3604- }
3605- catch(const CLI::ParseError& e)
3606- {
3607- return app.exit(e);
3608- }
3609-
3610- // NB: If we initialize gtest first, then it removes all of its own command-line
3611- // arguments and sets argc and argv correctly;
3612- ::testing::InitGoogleTest(&argc, argv);
3613-
3614- // Filename for fftw and fftwf wisdom.
3615- std::string fftw_wisdom_filename;
3616-
3617 // Token string to fully specify fft params for the manual test.
3618 std::string test_token;
3619-
3620- // Filename for precompiled kernels to be written to
3621- std::string precompile_file;
3622-
3623- // Declare the supported options. Some option pointers are declared to track passed opts.
3624- app.add_flag("--callback", "Inject load/store callbacks")->each([&](const std::string&) {
3625- manual_params.run_callbacks = true;
3626- });
3627- // app.add_flag("--version", "Print queryable version information from the rocfft library")
3628- // ->each([](const std::string&) {
3629- // rocfft_setup();
3630- // char v[256];
3631- // rocfft_get_version_string(v, 256);
3632- // std::cout << "rocFFT version: " << v << std::endl;
3633- // return EXIT_SUCCESS;
3634- // });
3635- CLI::Option* opt_token
3636+ auto* opt_token
3637 = app.add_option("--token", test_token, "Test token name for manual test")->default_val("");
3638 // Group together options that conflict with --token
3639 auto* non_token = app.add_option_group("Token Conflict", "Options excluded by --token");
3640+ non_token->excludes(opt_token);
3641+ // Declare the supported options. Some option pointers are declared to track passed opts.
3642+ non_token->add_flag("--callback", "Inject load/store callbacks")->each([&](const std::string&) {
3643+ manual_params.run_callbacks = true;
3644+ });
3645+ non_token
3646+ ->add_option("--auto_allocation",
3647+ manual_params.auto_allocate,
3648+ "hipFFT's auto-allocation behavior: \"on\", \"off\", or \"default\"")
3649+ ->default_val("default");
3650 non_token
3651 ->add_flag("--double", "Double precision transform (deprecated: use --precision double)")
3652 ->each([&](const std::string&) { manual_params.precision = fft_precision_double; });
3653- non_token->excludes(opt_token);
3654 non_token
3655 ->add_option("-t, --transformType",
3656 manual_params.transform_type,
3657@@ -406,8 +406,20 @@ int main(int argc, char* argv[])
3658 ->default_val(0);
3659 non_token->add_option("--ioffset", manual_params.ioffset, "Input offset");
3660 non_token->add_option("--ooffset", manual_params.ooffset, "Output offset");
3661- app.add_option("--isize", manual_params.isize, "Logical size of input buffer");
3662- app.add_option("--osize", manual_params.osize, "Logical size of output buffer");
3663+ non_token->add_option("--isize", manual_params.isize, "Logical size of input buffer");
3664+ non_token->add_option("--osize", manual_params.osize, "Logical size of output buffer");
3665+ non_token->add_option(
3666+ "--scalefactor", manual_params.scale_factor, "Scale factor to apply to output");
3667+ // Default value is set in fft_params.h based on if device-side PRNG was enabled.
3668+ non_token->add_option("-g, --inputGen",
3669+ manual_params.igen,
3670+ "Input data generation:\n0) PRNG sequence (device)\n"
3671+ "1) PRNG sequence (host)\n"
3672+ "2) linearly-spaced sequence (device)\n"
3673+ "3) linearly-spaced sequence (host)");
3674+ const auto* opt_version = app.add_flag(
3675+ "--version",
3676+ "Print queryable version information from the hipfft library's backend (and return)");
3677 app.add_option("--R", ramgb, "RAM limit in GiB for tests")
3678 ->default_val(host_memory::singleton().get_total_gbytes());
3679 app.add_option("--V", vramgb, "VRAM limit in GiB for tests")->default_val(0);
3680@@ -419,22 +431,19 @@ int main(int argc, char* argv[])
3681 "Skip the test if there is a runtime failure")
3682 ->default_val(true);
3683 app.add_option("-w, --wise", use_fftw_wisdom, "Use FFTW wisdom");
3684+ // Filename for fftw and fftwf wisdom.
3685+ std::string fftw_wisdom_filename;
3686 app.add_option("-W, --wisdomfile", fftw_wisdom_filename, "FFTW3 wisdom filename")
3687 ->default_val("wisdom3.txt");
3688- app.add_option("--scalefactor", manual_params.scale_factor, "Scale factor to apply to output");
3689+ // Filename for precompiled kernels to be written to
3690+ std::string precompile_file;
3691 app.add_option("--precompile",
3692 precompile_file,
3693 "Precompile kernels to a file for all test cases before running tests")
3694 ->default_val("");
3695- // Default value is set in fft_params.h based on if device-side PRNG was enabled.
3696- app.add_option("-g, --inputGen",
3697- manual_params.igen,
3698- "Input data generation:\n0) PRNG sequence (device)\n"
3699- "1) PRNG sequence (host)\n"
3700- "2) linearly-spaced sequence (device)\n"
3701- "3) linearly-spaced sequence (host)");
3702-
3703- // Parse rest of args and catch any errors here
3704+ // Try parsing initial args that will be used to configure tests
3705+ // Allow extras to pass on gtest arguments without error
3706+ app.allow_extras();
3707 try
3708 {
3709 app.parse(argc, argv);
3710@@ -444,26 +453,21 @@ int main(int argc, char* argv[])
3711 return app.exit(e);
3712 }
3713
3714- if(*opt_help)
3715+ if(!test_token.empty())
3716 {
3717- std::cout << app.help() << "\n";
3718- return EXIT_SUCCESS;
3719- }
3720+ std::cout << "Reading fft params from token:\n" << test_token << std::endl;
3721
3722- // Ensure there are no leftover options used by neither gtest nor CLI11
3723- std::vector<std::string> remaining_args = app.remaining();
3724- if(!remaining_args.empty())
3725- {
3726- std::cout << "Unrecognised option(s) found:\n ";
3727- for(auto i : app.remaining())
3728- std::cout << i << " ";
3729- std::cout << "\nRun with --help for more information.\n";
3730- return EXIT_FAILURE;
3731+ try
3732+ {
3733+ manual_params.from_token(test_token);
3734+ std::cout << "manual_params.token() = " << manual_params.token() << std::endl;
3735+ }
3736+ catch(...)
3737+ {
3738+ std::cout << "Unable to parse token." << std::endl;
3739+ return 1;
3740+ }
3741 }
3742-
3743- std::cout << "half epsilon: " << half_epsilon << "\tsingle epsilon: " << single_epsilon
3744- << "\tdouble epsilon: " << double_epsilon << std::endl;
3745-
3746 if(manual_params.length.empty())
3747 {
3748 manual_params.length.push_back(8);
3749@@ -482,6 +486,66 @@ int main(int argc, char* argv[])
3750 // TODO: add random size?
3751 }
3752
3753+ // User-settable options defining the values of all the actual test parameters
3754+ // (e.g., probability factors and value of manual_params) must be handled
3755+ // before invoking ::testing::InitGoogleTest as it triggers evaluation of said
3756+ // parameters (e.g., args of "::testing::Values{In}" in instantiations of test
3757+ // suites).
3758+ // set any "unset" parameters of manual_params before initiating gtests
3759+ // (makes the token reported by gtest less ambiguous)
3760+ manual_params.validate();
3761+
3762+ // extract remaining arguments for subsequent gtest initialization
3763+ std::vector<std::string> remaining_args = app.remaining();
3764+ std::string gtest_help_opt = "--help";
3765+ // NB: If we initialize gtest first, then it removes all of its own command-line
3766+ // arguments and sets argc and argv correctly;
3767+ std::vector<char*> gtest_argv;
3768+ gtest_argv.insert(gtest_argv.begin(), argv[0]);
3769+ for(std::string& s : remaining_args)
3770+ {
3771+ gtest_argv.push_back(&s[0]);
3772+ }
3773+ if(*opt_help)
3774+ {
3775+ // make sure gtest prints its help as well
3776+ gtest_argv.push_back(&gtest_help_opt[0]);
3777+ }
3778+ gtest_argv.push_back(NULL);
3779+ decltype(argc) gtest_argc = gtest_argv.size() - 1;
3780+ ::testing::InitGoogleTest(&gtest_argc, gtest_argv.data()); // gtest-relevant args are removed
3781+
3782+ if(*opt_help)
3783+ {
3784+ std::cout << app.help() << "\n";
3785+ return EXIT_SUCCESS;
3786+ }
3787+ // no help was used, gtest_argc is expected to be 1 at this point. If not, some of the
3788+ // used options were not recognized at all
3789+ if(gtest_argc > 1)
3790+ {
3791+ std::cout << "Unrecognised option(s) found:\n ";
3792+ for(auto i = 1; i < gtest_argc; i++)
3793+ std::cout << gtest_argv[i] << " ";
3794+ std::cout << "\nRun with --help for more information.\n";
3795+ return EXIT_FAILURE;
3796+ }
3797+
3798+ if(*opt_version || verbose > 0)
3799+ {
3800+ int hipfft_version;
3801+ hipfftGetVersion(&hipfft_version);
3802+ std::cout << "hipFFT version: " << hipfft_version << std::endl;
3803+ if(*opt_version)
3804+ {
3805+ return EXIT_SUCCESS;
3806+ }
3807+ }
3808+
3809+ std::cout << "Using random_seed = " << random_seed << std::endl;
3810+ std::cout << "half epsilon: " << half_epsilon << "\tsingle epsilon: " << single_epsilon
3811+ << "\tdouble epsilon: " << double_epsilon << std::endl;
3812+
3813 // if precompiling, tell rocFFT to use the specified cache file
3814 // to write kernels to
3815 //
3816@@ -494,11 +558,6 @@ int main(int argc, char* argv[])
3817 precompile_file.c_str());
3818 }
3819
3820- // rocfft_setup();
3821- // char v[256];
3822- // rocfft_get_version_string(v, 256);
3823- // std::cout << "rocFFT version: " << v << std::endl;
3824-
3825 #ifdef FFTW_MULTITHREAD
3826 fftw_init_threads();
3827 fftwf_init_threads();
3828@@ -508,6 +567,7 @@ int main(int argc, char* argv[])
3829
3830 // Set host memory limit from command-line options
3831 host_memory::singleton().set_limit_gbytes(ramgb);
3832+ std::cout << "Host memory limit: " << ramgb << " GiB" << std::endl;
3833
3834 if(use_fftw_wisdom)
3835 {
3836@@ -555,21 +615,6 @@ int main(int argc, char* argv[])
3837 fftwf_import_wisdom_from_string(fftwf_wisdom.c_str());
3838 }
3839
3840- if(!test_token.empty())
3841- {
3842- std::cout << "Reading fft params from token:\n" << test_token << std::endl;
3843-
3844- try
3845- {
3846- manual_params.from_token(test_token);
3847- }
3848- catch(...)
3849- {
3850- std::cout << "Unable to parse token." << std::endl;
3851- return 1;
3852- }
3853- }
3854-
3855 if(!precompile_file.empty())
3856 precompile_test_kernels(precompile_file);
3857
3858@@ -593,38 +638,14 @@ int main(int argc, char* argv[])
3859 std::cout << "double precision max l-inf epsilon: " << max_linf_eps_double << std::endl;
3860 std::cout << "double precision max l2 epsilon: " << max_l2_eps_double << std::endl;
3861
3862- // rocfft_cleanup();
3863+ hipfft_params::externally_managed_workareas.clear();
3864+
3865 return retval;
3866 }
3867
3868-TEST(manual, vs_fftw)
3869-{
3870- // Run an individual test using the provided command-line parameters.
3871-
3872- std::cout << "Manual test:" << std::endl;
3873-
3874- manual_params.validate();
3875-
3876- std::cout << "Token: " << manual_params.token() << std::endl;
3877-
3878- hipfft_params params(manual_params);
3879-
3880- try
3881- {
3882- fft_vs_reference(params, false);
3883- }
3884- catch(HOSTBUF_MEM_USAGE& e)
3885- {
3886- // explicitly clear test cache
3887- last_cpu_fft_data = last_cpu_fft_cache();
3888- GTEST_SKIP() << e.msg;
3889- }
3890- catch(ROCFFT_SKIP& e)
3891- {
3892- GTEST_SKIP() << e.msg;
3893- }
3894- catch(ROCFFT_FAIL& e)
3895- {
3896- GTEST_FAIL() << e.msg;
3897- }
3898-}
3899+// instantiation of the paramameterized accuracy_test for the
3900+// configuration set manually:
3901+INSTANTIATE_TEST_SUITE_P(manual,
3902+ accuracy_test,
3903+ ::testing::Values(static_cast<const fft_params&>(manual_params)),
3904+ accuracy_test::TestName);
3905diff --git a/clients/tests/hipfft_accuracy_test.cpp b/clients/tests/hipfft_accuracy_test.cpp
3906index e228c5f..a5c763e 100644
3907--- a/clients/tests/hipfft_accuracy_test.cpp
3908+++ b/clients/tests/hipfft_accuracy_test.cpp
3909@@ -34,6 +34,7 @@
3910 #include "../../shared/accuracy_test.h"
3911 #include "../../shared/fftw_transform.h"
3912 #include "../../shared/gpubuf.h"
3913+#include "../../shared/params_gen.h"
3914 #include "../../shared/rocfft_against_fftw.h"
3915 #include "../../shared/rocfft_complex.h"
3916 #include "../../shared/subprocess.h"
3917@@ -42,6 +43,21 @@ extern std::string mp_launch;
3918
3919 extern last_cpu_fft_cache last_cpu_fft_data;
3920
3921+// clang-format off
3922+// tokens of tests found to be symptomatic
3923+static const std::vector<std::string> symptomatic_tokens = {
3924+#ifndef _CUFFT_BACKEND
3925+// cases specific to ROCM backend
3926+#else
3927+ // cases specific to CUFFT backend
3928+ "real_forward_len_16384_half_ip_batch_4_istride_1_R_ostride_1_HI_idist_16386_odist_8193_ioffset_0_0_ooffset_0_0",
3929+ "real_forward_len_32768_half_ip_batch_4_istride_1_R_ostride_1_HI_idist_32770_odist_16385_ioffset_0_0_ooffset_0_0",
3930+ "real_forward_len_65536_half_ip_batch_2_istride_1_R_ostride_1_HI_idist_65538_odist_32769_ioffset_0_0_ooffset_0_0",
3931+#endif
3932+ // common to both backends
3933+};
3934+// clang-format on
3935+
3936 void fft_vs_reference(hipfft_params& params, bool round_trip)
3937 {
3938 switch(params.precision)
3939@@ -78,29 +94,43 @@ TEST_P(accuracy_test, vs_fftw)
3940 {
3941 case fft_params::fft_mp_lib_none:
3942 {
3943- // only do round trip for single-GPU FFTs
3944- bool round_trip = params.multiGPU <= 1;
3945+ // skipping symptomatic case(s), unless forcefully/knowingly executing normally-disabled
3946+ // test tokens (e.g., by using --gtest_also_run_disabled_tests)
3947+ const char* test_suite_name
3948+ = ::testing::UnitTest::GetInstance()->current_test_info()->test_suite_name();
3949+ if(!symptomatic_tokens.empty() && std::strstr(test_suite_name, "DISABLED") == nullptr
3950+ && std::find(symptomatic_tokens.begin(), symptomatic_tokens.end(), params.token())
3951+ != symptomatic_tokens.end())
3952+ {
3953+ GTEST_SKIP()
3954+ << "Symptomatic test that's currently disabled by default (force-skipping). Use "
3955+ "CLI arguments '--gtest_also_run_disabled_tests' to force the test execution "
3956+ "(via another test suite).";
3957+ }
3958+ // only do round trip for forward FFTs
3959+ const bool do_round_trip = params.is_forward();
3960
3961- if(!params.run_callbacks)
3962+ try
3963 {
3964- try
3965- {
3966- fft_vs_reference(params, round_trip);
3967- }
3968- catch(HOSTBUF_MEM_USAGE& e)
3969- {
3970- // explicitly clear cache
3971- last_cpu_fft_data = last_cpu_fft_cache();
3972- GTEST_SKIP() << e.msg;
3973- }
3974- catch(ROCFFT_SKIP& e)
3975- {
3976- GTEST_SKIP() << e.msg;
3977- }
3978- catch(ROCFFT_FAIL& e)
3979- {
3980- GTEST_FAIL() << e.msg;
3981- }
3982+ fft_vs_reference(params, do_round_trip);
3983+ }
3984+ catch(HOSTBUF_MEM_USAGE& e)
3985+ {
3986+ // explicitly clear cache
3987+ last_cpu_fft_data = last_cpu_fft_cache();
3988+ GTEST_SKIP() << e.msg;
3989+ }
3990+ catch(ROCFFT_SKIP& e)
3991+ {
3992+ GTEST_SKIP() << e.msg;
3993+ }
3994+ catch(const fft_params::unimplemented_exception& e)
3995+ {
3996+ GTEST_SKIP() << "Unimplemented exception: " << e.what();
3997+ }
3998+ catch(ROCFFT_FAIL& e)
3999+ {
4000+ GTEST_FAIL() << e.msg;
4001 }
4002 break;
4003 }
4004@@ -140,6 +170,11 @@ TEST_P(accuracy_test, vs_fftw)
4005 SUCCEED();
4006 }
4007
4008+INSTANTIATE_TEST_SUITE_P(DISABLED_symptomatic_tokens,
4009+ accuracy_test,
4010+ ::testing::ValuesIn(param_generator_token(test_prob, symptomatic_tokens)),
4011+ accuracy_test::TestName);
4012+
4013 #ifdef __HIP__
4014
4015 // load/store callbacks - cbdata in each is actually a scalar double
4016@@ -166,6 +201,37 @@ __device__ auto load_callback_dev_complex_float = load_callback<rocfft_complex<
4017 __device__ auto load_callback_dev_double = load_callback<double>;
4018 __device__ auto load_callback_dev_complex_double = load_callback<rocfft_complex<double>>;
4019
4020+// load/store callbacks - cbdata in each is actually a scalar double
4021+// with a number to apply to each element
4022+template <typename Tdata>
4023+__host__ __device__ Tdata
4024+ load_callback_round_trip_inverse(Tdata* input, size_t offset, void* cbdata, void* sharedMem)
4025+{
4026+ auto testdata = static_cast<const callback_test_data*>(cbdata);
4027+ // subtract each element by scalar
4028+ if(input == testdata->base)
4029+ return input[offset] - testdata->scalar;
4030+ // wrong base address passed, return something obviously wrong
4031+ else
4032+ {
4033+ // wrong base address passed, return something obviously wrong
4034+ return input[0];
4035+ }
4036+}
4037+
4038+__device__ auto load_callback_round_trip_inverse_dev_half
4039+ = load_callback_round_trip_inverse<rocfft_fp16>;
4040+__device__ auto load_callback_round_trip_inverse_dev_complex_half
4041+ = load_callback_round_trip_inverse<rocfft_complex<rocfft_fp16>>;
4042+__device__ auto load_callback_round_trip_inverse_dev_float
4043+ = load_callback_round_trip_inverse<float>;
4044+__device__ auto load_callback_round_trip_inverse_dev_complex_float
4045+ = load_callback_round_trip_inverse<rocfft_complex<float>>;
4046+__device__ auto load_callback_round_trip_inverse_dev_double
4047+ = load_callback_round_trip_inverse<double>;
4048+__device__ auto load_callback_round_trip_inverse_dev_complex_double
4049+ = load_callback_round_trip_inverse<rocfft_complex<double>>;
4050+
4051 void* get_load_callback_host(fft_array_type itype,
4052 fft_precision precision,
4053 bool round_trip_inverse = false)
4054@@ -179,22 +245,55 @@ void* get_load_callback_host(fft_array_type itype,
4055 switch(precision)
4056 {
4057 case fft_precision_half:
4058- EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4059- HIP_SYMBOL(load_callback_dev_complex_half),
4060- sizeof(void*)),
4061- hipSuccess);
4062+ if(round_trip_inverse)
4063+ {
4064+ EXPECT_EQ(hipMemcpyFromSymbol(
4065+ &load_callback_host,
4066+ HIP_SYMBOL(load_callback_round_trip_inverse_dev_complex_half),
4067+ sizeof(void*)),
4068+ hipSuccess);
4069+ }
4070+ else
4071+ {
4072+ EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4073+ HIP_SYMBOL(load_callback_dev_complex_half),
4074+ sizeof(void*)),
4075+ hipSuccess);
4076+ }
4077 return load_callback_host;
4078 case fft_precision_single:
4079- EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4080- HIP_SYMBOL(load_callback_dev_complex_float),
4081- sizeof(void*)),
4082- hipSuccess);
4083+ if(round_trip_inverse)
4084+ {
4085+ EXPECT_EQ(hipMemcpyFromSymbol(
4086+ &load_callback_host,
4087+ HIP_SYMBOL(load_callback_round_trip_inverse_dev_complex_float),
4088+ sizeof(void*)),
4089+ hipSuccess);
4090+ }
4091+ else
4092+ {
4093+ EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4094+ HIP_SYMBOL(load_callback_dev_complex_float),
4095+ sizeof(void*)),
4096+ hipSuccess);
4097+ }
4098 return load_callback_host;
4099 case fft_precision_double:
4100- EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4101- HIP_SYMBOL(load_callback_dev_complex_double),
4102- sizeof(void*)),
4103- hipSuccess);
4104+ if(round_trip_inverse)
4105+ {
4106+ EXPECT_EQ(hipMemcpyFromSymbol(
4107+ &load_callback_host,
4108+ HIP_SYMBOL(load_callback_round_trip_inverse_dev_complex_double),
4109+ sizeof(void*)),
4110+ hipSuccess);
4111+ }
4112+ else
4113+ {
4114+ EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4115+ HIP_SYMBOL(load_callback_dev_complex_double),
4116+ sizeof(void*)),
4117+ hipSuccess);
4118+ }
4119 return load_callback_host;
4120 }
4121 }
4122@@ -203,19 +302,55 @@ void* get_load_callback_host(fft_array_type itype,
4123 switch(precision)
4124 {
4125 case fft_precision_half:
4126- EXPECT_EQ(hipMemcpyFromSymbol(
4127- &load_callback_host, HIP_SYMBOL(load_callback_dev_half), sizeof(void*)),
4128- hipSuccess);
4129+ if(round_trip_inverse)
4130+ {
4131+ EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4132+ HIP_SYMBOL(load_callback_round_trip_inverse_dev_half),
4133+ sizeof(void*)),
4134+ hipSuccess);
4135+ }
4136+ else
4137+ {
4138+ EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4139+ HIP_SYMBOL(load_callback_dev_half),
4140+ sizeof(void*)),
4141+ hipSuccess);
4142+ }
4143 return load_callback_host;
4144 case fft_precision_single:
4145- EXPECT_EQ(hipMemcpyFromSymbol(
4146- &load_callback_host, HIP_SYMBOL(load_callback_dev_float), sizeof(void*)),
4147- hipSuccess);
4148+ if(round_trip_inverse)
4149+ {
4150+ EXPECT_EQ(
4151+ hipMemcpyFromSymbol(&load_callback_host,
4152+ HIP_SYMBOL(load_callback_round_trip_inverse_dev_float),
4153+ sizeof(void*)),
4154+ hipSuccess);
4155+ }
4156+ else
4157+ {
4158+ EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4159+ HIP_SYMBOL(load_callback_dev_float),
4160+ sizeof(void*)),
4161+ hipSuccess);
4162+ }
4163 return load_callback_host;
4164 case fft_precision_double:
4165- EXPECT_EQ(hipMemcpyFromSymbol(
4166- &load_callback_host, HIP_SYMBOL(load_callback_dev_double), sizeof(void*)),
4167- hipSuccess);
4168+
4169+ if(round_trip_inverse)
4170+ {
4171+ EXPECT_EQ(
4172+ hipMemcpyFromSymbol(&load_callback_host,
4173+ HIP_SYMBOL(load_callback_round_trip_inverse_dev_double),
4174+ sizeof(void*)),
4175+ hipSuccess);
4176+ }
4177+ else
4178+ {
4179+ EXPECT_EQ(hipMemcpyFromSymbol(&load_callback_host,
4180+ HIP_SYMBOL(load_callback_dev_double),
4181+ sizeof(void*)),
4182+ hipSuccess);
4183+ }
4184 return load_callback_host;
4185 }
4186 }
4187@@ -245,6 +380,31 @@ __device__ auto store_callback_dev_complex_float = store_callback<rocfft_comple
4188 __device__ auto store_callback_dev_double = store_callback<double>;
4189 __device__ auto store_callback_dev_complex_double = store_callback<rocfft_complex<double>>;
4190
4191+template <typename Tdata>
4192+__host__ __device__ static void store_callback_round_trip_inverse(
4193+ Tdata* output, size_t offset, Tdata element, void* cbdata, void* sharedMem)
4194+{
4195+ auto testdata = static_cast<callback_test_data*>(cbdata);
4196+ // divide each element by scalar
4197+ if(output == testdata->base)
4198+ {
4199+ output[offset] = element / testdata->scalar;
4200+ }
4201+ // otherwise, wrong base address passed, just don't write
4202+}
4203+__device__ auto store_callback_round_trip_inverse_dev_half
4204+ = store_callback_round_trip_inverse<rocfft_fp16>;
4205+__device__ auto store_callback_round_trip_inverse_dev_complex_half
4206+ = store_callback_round_trip_inverse<rocfft_complex<rocfft_fp16>>;
4207+__device__ auto store_callback_round_trip_inverse_dev_float
4208+ = store_callback_round_trip_inverse<float>;
4209+__device__ auto store_callback_round_trip_inverse_dev_complex_float
4210+ = store_callback_round_trip_inverse<rocfft_complex<float>>;
4211+__device__ auto store_callback_round_trip_inverse_dev_double
4212+ = store_callback_round_trip_inverse<double>;
4213+__device__ auto store_callback_round_trip_inverse_dev_complex_double
4214+ = store_callback_round_trip_inverse<rocfft_complex<double>>;
4215+
4216 void* get_store_callback_host(fft_array_type otype,
4217 fft_precision precision,
4218 bool round_trip_inverse = false)
4219@@ -258,22 +418,55 @@ void* get_store_callback_host(fft_array_type otype,
4220 switch(precision)
4221 {
4222 case fft_precision_half:
4223- EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4224- HIP_SYMBOL(store_callback_dev_complex_half),
4225- sizeof(void*)),
4226- hipSuccess);
4227+ if(round_trip_inverse)
4228+ {
4229+ EXPECT_EQ(hipMemcpyFromSymbol(
4230+ &store_callback_host,
4231+ HIP_SYMBOL(store_callback_round_trip_inverse_dev_complex_half),
4232+ sizeof(void*)),
4233+ hipSuccess);
4234+ }
4235+ else
4236+ {
4237+ EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4238+ HIP_SYMBOL(store_callback_dev_complex_half),
4239+ sizeof(void*)),
4240+ hipSuccess);
4241+ }
4242 return store_callback_host;
4243 case fft_precision_single:
4244- EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4245- HIP_SYMBOL(store_callback_dev_complex_float),
4246- sizeof(void*)),
4247- hipSuccess);
4248+ if(round_trip_inverse)
4249+ {
4250+ EXPECT_EQ(hipMemcpyFromSymbol(
4251+ &store_callback_host,
4252+ HIP_SYMBOL(store_callback_round_trip_inverse_dev_complex_float),
4253+ sizeof(void*)),
4254+ hipSuccess);
4255+ }
4256+ else
4257+ {
4258+ EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4259+ HIP_SYMBOL(store_callback_dev_complex_float),
4260+ sizeof(void*)),
4261+ hipSuccess);
4262+ }
4263 return store_callback_host;
4264 case fft_precision_double:
4265- EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4266- HIP_SYMBOL(store_callback_dev_complex_double),
4267- sizeof(void*)),
4268- hipSuccess);
4269+ if(round_trip_inverse)
4270+ {
4271+ EXPECT_EQ(hipMemcpyFromSymbol(
4272+ &store_callback_host,
4273+ HIP_SYMBOL(store_callback_round_trip_inverse_dev_complex_double),
4274+ sizeof(void*)),
4275+ hipSuccess);
4276+ }
4277+ else
4278+ {
4279+ EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4280+ HIP_SYMBOL(store_callback_dev_complex_double),
4281+ sizeof(void*)),
4282+ hipSuccess);
4283+ }
4284 return store_callback_host;
4285 }
4286 }
4287@@ -282,21 +475,55 @@ void* get_store_callback_host(fft_array_type otype,
4288 switch(precision)
4289 {
4290 case fft_precision_half:
4291- EXPECT_EQ(hipMemcpyFromSymbol(
4292- &store_callback_host, HIP_SYMBOL(store_callback_dev_half), sizeof(void*)),
4293- hipSuccess);
4294+ if(round_trip_inverse)
4295+ {
4296+ EXPECT_EQ(
4297+ hipMemcpyFromSymbol(&store_callback_host,
4298+ HIP_SYMBOL(store_callback_round_trip_inverse_dev_half),
4299+ sizeof(void*)),
4300+ hipSuccess);
4301+ }
4302+ else
4303+ {
4304+ EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4305+ HIP_SYMBOL(store_callback_dev_half),
4306+ sizeof(void*)),
4307+ hipSuccess);
4308+ }
4309 return store_callback_host;
4310 case fft_precision_single:
4311- EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4312- HIP_SYMBOL(store_callback_dev_float),
4313- sizeof(void*)),
4314- hipSuccess);
4315+ if(round_trip_inverse)
4316+ {
4317+ EXPECT_EQ(
4318+ hipMemcpyFromSymbol(&store_callback_host,
4319+ HIP_SYMBOL(store_callback_round_trip_inverse_dev_float),
4320+ sizeof(void*)),
4321+ hipSuccess);
4322+ }
4323+ else
4324+ {
4325+ EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4326+ HIP_SYMBOL(store_callback_dev_float),
4327+ sizeof(void*)),
4328+ hipSuccess);
4329+ }
4330 return store_callback_host;
4331 case fft_precision_double:
4332- EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4333- HIP_SYMBOL(store_callback_dev_double),
4334- sizeof(void*)),
4335- hipSuccess);
4336+ if(round_trip_inverse)
4337+ {
4338+ EXPECT_EQ(
4339+ hipMemcpyFromSymbol(&store_callback_host,
4340+ HIP_SYMBOL(store_callback_round_trip_inverse_dev_double),
4341+ sizeof(void*)),
4342+ hipSuccess);
4343+ }
4344+ else
4345+ {
4346+ EXPECT_EQ(hipMemcpyFromSymbol(&store_callback_host,
4347+ HIP_SYMBOL(store_callback_dev_double),
4348+ sizeof(void*)),
4349+ hipSuccess);
4350+ }
4351 return store_callback_host;
4352 }
4353 }
4354diff --git a/clients/tests/hipfft_mpi_worker.cpp b/clients/tests/hipfft_mpi_worker.cpp
4355index 9a1e219..a4b85c6 100644
4356--- a/clients/tests/hipfft_mpi_worker.cpp
4357+++ b/clients/tests/hipfft_mpi_worker.cpp
4358@@ -23,6 +23,9 @@
4359 #include "../../shared/mpi_worker.h"
4360 #include "../hipfft_params.h"
4361
4362+// initialize static class member of hipfft_params
4363+std::vector<gpubuf> hipfft_params::externally_managed_workareas = std::vector<gpubuf>();
4364+
4365 int main(int argc, char* argv[])
4366 {
4367 return mpi_worker_main<std::array<hipfft_params, 1>, false>(
4368diff --git a/clients/tests/hipfft_test_params.h b/clients/tests/hipfft_test_params.h
4369deleted file mode 100644
4370index 4d22fbb..0000000
4371--- a/clients/tests/hipfft_test_params.h
4372+++ /dev/null
4373@@ -1,32 +0,0 @@
4374-// Copyright (C) 2022 - 2022 Advanced Micro Devices, Inc. All rights reserved.
4375-//
4376-// Permission is hereby granted, free of charge, to any person obtaining a copy
4377-// of this software and associated documentation files (the "Software"), to deal
4378-// in the Software without restriction, including without limitation the rights
4379-// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4380-// copies of the Software, and to permit persons to whom the Software is
4381-// furnished to do so, subject to the following conditions:
4382-//
4383-// The above copyright notice and this permission notice shall be included in
4384-// all copies or substantial portions of the Software.
4385-//
4386-// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4387-// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4388-// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4389-// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4390-// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4391-// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4392-// THE SOFTWARE.
4393-
4394-#pragma once
4395-#ifndef TESTCONSTANTS_H
4396-#define TESTCONSTANTS_H
4397-
4398-#include "hipfft/hipfft.h"
4399-
4400-#include <stdexcept>
4401-
4402-extern int verbose;
4403-extern size_t ramgb;
4404-
4405-#endif
4406diff --git a/clients/tests/hipfftw_test.cpp b/clients/tests/hipfftw_test.cpp
4407new file mode 100644
4408index 0000000..b0ad0b0
4409--- /dev/null
4410+++ b/clients/tests/hipfftw_test.cpp
4411@@ -0,0 +1,2214 @@
4412+// Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved.
4413+//
4414+// Permission is hereby granted, free of charge, to any person obtaining a copy
4415+// of this software and associated documentation files (the "Software"), to deal
4416+// in the Software without restriction, including without limitation the rights
4417+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
4418+// copies of the Software, and to permit persons to whom the Software is
4419+// furnished to do so, subject to the following conditions:
4420+//
4421+// The above copyright notice and this permission notice shall be included in
4422+// all copies or substantial portions of the Software.
4423+//
4424+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
4425+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
4426+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4427+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
4428+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
4429+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
4430+// THE SOFTWARE.
4431+
4432+#include "../hipfftw_helper.h"
4433+
4434+#include "../../shared/environment.h"
4435+#include "../../shared/gpubuf.h"
4436+#include "../../shared/hostbuf.h"
4437+#include "../../shared/params_gen.h"
4438+#include "../../shared/test_params.h"
4439+
4440+#include <cstdint>
4441+#include <cstring>
4442+#include <fftw3.h>
4443+#include <future>
4444+#include <gtest/gtest.h>
4445+#include <memory>
4446+#include <numeric>
4447+
4448+#ifdef WIN32
4449+#include <windows.h>
4450+#else
4451+#include <cstdlib>
4452+#endif
4453+
4454+#ifdef _OPENMP
4455+#include <omp.h>
4456+#endif
4457+
4458+// test details
4459+namespace
4460+{
4461+ //
4462+ //---------------------------------------------------------------------------------------------
4463+ // COMMONS AND HELPERS
4464+ //---------------------------------------------------------------------------------------------
4465+ //
4466+
4467+ size_t max_byte_size_for_hipfftw_tests()
4468+ {
4469+ auto get_io_byte_size_limit = []() {
4470+ size_t tmp = vramgb * ONE_GiB;
4471+ if(tmp == 0)
4472+ {
4473+ size_t free = 0, total = 0;
4474+ if(hipMemGetInfo(&free, &total) == hipSuccess)
4475+ tmp = total / 8;
4476+ }
4477+ tmp = std::min(tmp, ramgb * ONE_GiB);
4478+ tmp = std::min(tmp, max_io_gb_for_hipfftw_test * ONE_GiB);
4479+ if(verbose > 0)
4480+ {
4481+ std::cout << "Limit for the size of I/O data used in hipfftw tests: ";
4482+ if(tmp >= ONE_GiB)
4483+ std::cout << float(tmp) / ONE_GiB << " GiB." << std ::endl;
4484+ else if(tmp >= ONE_MiB)
4485+ std::cout << float(tmp) / ONE_MiB << " MiB." << std ::endl;
4486+ else
4487+ std::cout << float(tmp) / ONE_KiB << " KiB." << std ::endl;
4488+ }
4489+ return tmp;
4490+ };
4491+ static const size_t io_byte_size_limit = get_io_byte_size_limit();
4492+ return io_byte_size_limit;
4493+ }
4494+
4495+ std::ranlux24_base& get_pseudo_rng()
4496+ {
4497+ static std::ranlux24_base gen(random_seed);
4498+ return gen;
4499+ }
4500+
4501+ // NOTE: this function makes use of comparison operator < and != which must be defined for the
4502+ // specialization type T
4503+ template <typename T>
4504+ void insert_into_unique_sorted_params(std::vector<T>& unique_sorted_params,
4505+ const T& param_to_insert)
4506+ {
4507+ auto it = std::lower_bound(
4508+ unique_sorted_params.begin(), unique_sorted_params.end(), param_to_insert);
4509+ if(it != unique_sorted_params.end() && *it == param_to_insert)
4510+ return; // it's already in there generated
4511+ unique_sorted_params.insert(it, param_to_insert);
4512+ }
4513+
4514+ enum class hipfftw_internal_exception
4515+ {
4516+ none,
4517+ flow_redirection,
4518+ invalid_args,
4519+ unsupported_args,
4520+ ill_defined
4521+ };
4522+
4523+ // for well-defined internal exceptions, we may expect a specific report thereof
4524+ // in the hipfftw exception log (if logging is activated)
4525+ template <hipfftw_internal_exception>
4526+ constexpr std::string_view hipfftw_expected_log_instance;
4527+ template <>
4528+ constexpr std::string_view hipfftw_expected_log_instance<
4529+ hipfftw_internal_exception::invalid_args> = R"(Invalid argument reported)";
4530+ template <>
4531+ constexpr std::string_view hipfftw_expected_log_instance<
4532+ hipfftw_internal_exception::unsupported_args> = R"(Unsupported usage reported)";
4533+ template <>
4534+ constexpr std::string_view hipfftw_expected_log_instance<
4535+ hipfftw_internal_exception::flow_redirection> = R"(Redirecting execution flow)";
4536+
4537+ // randomizers
4538+ // Note: albeit not supported, ranks > 3 are "valid" rank argument
4539+ // --> limiting rank value to max of 10 by default to avoid ridiculously long
4540+ // lengths possibly created in automated parameter generations;
4541+ template <bool validity_flag,
4542+ int min_rank = validity_flag ? 1 : std::numeric_limits<int>::lowest(),
4543+ int max_rank = validity_flag ? 10 : 0,
4544+ std::enable_if_t<(min_rank <= max_rank) && (!validity_flag || min_rank > 0)
4545+ && (validity_flag || max_rank <= 0),
4546+ bool> = true>
4547+ int get_random_rank()
4548+ {
4549+ static std::uniform_int_distribution<int> rank_rng(min_rank, max_rank);
4550+
4551+ auto ret = rank_rng(get_pseudo_rng());
4552+ if(rank_is_valid_for_hipfftw(ret) != validity_flag)
4553+ {
4554+ throw std::runtime_error(
4555+ "failed to generate a rank value of desired validity randomly");
4556+ }
4557+ return ret;
4558+ }
4559+ template <bool validity_flag, typename type_to_consider_for_validity>
4560+ std::vector<ptrdiff_t>
4561+ get_random_lengths(int desired_rank,
4562+ ptrdiff_t max_abs_len
4563+ = std::numeric_limits<type_to_consider_for_validity>::max(),
4564+ ptrdiff_t min_abs_len = 0)
4565+ {
4566+ std::vector<ptrdiff_t> ret;
4567+ // cannot generate lengths for invalid ranks --> return empty lengths in that case
4568+ if(!rank_is_valid_for_hipfftw(desired_rank))
4569+ return ret;
4570+ if(min_abs_len < 0 || min_abs_len > max_abs_len)
4571+ throw std::invalid_argument("invalid bounds used for get_random_lengths");
4572+ // generate values that are all representable as integers
4573+ auto& pseudo_rng = get_pseudo_rng();
4574+ std::uniform_int_distribution<ptrdiff_t> length_rng(min_abs_len, max_abs_len);
4575+ // setter lambda
4576+ auto set_random_len = [&]() {
4577+ for(auto& l : ret)
4578+ {
4579+ const ptrdiff_t val = length_rng(pseudo_rng);
4580+ if constexpr(validity_flag)
4581+ l = val;
4582+ else
4583+ {
4584+ if(pseudo_rng() % 2)
4585+ l = -val;
4586+ else
4587+ l = val;
4588+ }
4589+ }
4590+ };
4591+
4592+ ret.resize(desired_rank);
4593+ set_random_len();
4594+ while(lengths_are_valid_for_hipfftw_as<type_to_consider_for_validity>(ret, desired_rank)
4595+ != validity_flag)
4596+ set_random_len();
4597+
4598+ return ret;
4599+ }
4600+ template <bool validity_flag>
4601+ int get_random_sign(fft_transform_type intended_dft_kind)
4602+ {
4603+ if(!validity_flag && is_real(intended_dft_kind))
4604+ throw std::invalid_argument("An invalid sign cannot be generated for real transforms "
4605+ "(sign is ignored in that case)");
4606+ std::uniform_int_distribution<int> sign_rng(std::numeric_limits<int>::lowest(),
4607+ std::numeric_limits<int>::max());
4608+
4609+ int tmp = validity_flag && is_complex(intended_dft_kind)
4610+ ? (is_fwd(intended_dft_kind) ? FFTW_FORWARD : FFTW_BACKWARD)
4611+ : sign_rng(get_pseudo_rng());
4612+
4613+ while(sign_is_valid_for_hipfftw(tmp, intended_dft_kind) != validity_flag)
4614+ tmp = sign_rng(get_pseudo_rng());
4615+ return tmp;
4616+ }
4617+ template <bool validity_flag>
4618+ unsigned get_random_flags()
4619+ {
4620+ std::uniform_int_distribution<unsigned> flags_rng(std::numeric_limits<unsigned>::lowest(),
4621+ std::numeric_limits<unsigned>::max());
4622+
4623+ auto tmp = flags_rng(get_pseudo_rng());
4624+ if constexpr(validity_flag)
4625+ {
4626+ tmp &= hipfftw_valid_flags_mask;
4627+ if(!flags_are_valid_for_hipfftw(tmp))
4628+ throw std::runtime_error("failed to create random valid flags");
4629+ return tmp;
4630+ }
4631+ while(flags_are_valid_for_hipfftw(tmp))
4632+ tmp = flags_rng(get_pseudo_rng());
4633+ return tmp;
4634+ }
4635+ size_t get_random_idx(size_t upper_bound)
4636+ {
4637+ if(upper_bound == 0)
4638+ throw std::invalid_argument("upper_bound must be strictly positive for get_random_idx");
4639+ std::uniform_int_distribution<size_t> idx_rng(0, upper_bound - 1);
4640+ return idx_rng(get_pseudo_rng());
4641+ }
4642+
4643+ // calculates the threshold value X such that max_data_idx is no greater
4644+ // (resp. larger) than num_elems, if all elements of lengths are all no greater
4645+ // (resp. larger) than X, and lengths.size() == rank [using bisection]
4646+ ptrdiff_t get_len_threshold(size_t num_elems, int rank, bool is_real_inplace)
4647+ {
4648+ if(rank < 1)
4649+ throw std::invalid_argument("invalid rank used in get_len_threshold");
4650+ if(num_elems == 0)
4651+ return 1;
4652+ constexpr ptrdiff_t X_max = std::numeric_limits<ptrdiff_t>::max();
4653+ // we need to find X in [0, X_max] s.t.
4654+ // largest_idx(X) <= num_elems && largest_idx(X + 1) > num_elems
4655+ auto largest_idx = [&](ptrdiff_t X) {
4656+ size_t ret = rank > 1 && is_real_inplace ? 2 * (X / 2 + 1) : X;
4657+ for(auto i = 1; i < rank; i++)
4658+ ret *= X;
4659+ return ret;
4660+ };
4661+ // initialization
4662+ ptrdiff_t X_down
4663+ = rank == 1 ? static_cast<ptrdiff_t>(num_elems)
4664+ : static_cast<ptrdiff_t>(std::floor(std::pow(num_elems, 1.0 / rank)));
4665+ ptrdiff_t diff = 1;
4666+ ptrdiff_t X_up = X_down;
4667+ while(largest_idx(X_up) <= num_elems && X_up < X_max)
4668+ {
4669+ X_down = X_up;
4670+ X_up = X_up <= X_max - diff ? X_up + diff : X_max;
4671+ diff *= 2;
4672+ }
4673+ diff = 1;
4674+ while(largest_idx(X_down) > num_elems && X_down > 0)
4675+ {
4676+ X_up = X_down;
4677+ X_down = X_down >= diff ? X_down - diff : 0;
4678+ diff *= 2;
4679+ }
4680+ // bisection
4681+ while(X_up - X_down > 1)
4682+ {
4683+ const auto tmp = (X_up + X_down) / 2;
4684+ if(largest_idx(tmp) <= num_elems)
4685+ X_down = tmp;
4686+ else
4687+ X_up = tmp;
4688+ }
4689+ return X_down;
4690+ }
4691+
4692+ template <fft_precision prec>
4693+ size_t max_num_elems_for_data_size(size_t data_byte_size, fft_transform_type dft_kind)
4694+ {
4695+ return data_byte_size
4696+ / (is_real(dft_kind) ? sizeof(hipfftw_real_t<prec>)
4697+ : sizeof(hipfftw_complex_t<prec>));
4698+ }
4699+
4700+ // exception for hip runtime error(s) specifically
4701+ struct hip_runtime_error : public std::runtime_error
4702+ {
4703+ const hipError_t hip_error;
4704+ hip_runtime_error(const std::string& info, hipError_t hip_status)
4705+ : std::runtime_error::runtime_error(info)
4706+ , hip_error(hip_status)
4707+
4708+ {
4709+ }
4710+ };
4711+ int get_current_device_id()
4712+ {
4713+ int ret = hipInvalidDeviceId;
4714+ const auto hip_status = hipGetDevice(&ret);
4715+ if(hip_status != hipSuccess)
4716+ throw hip_runtime_error("hipGetDevice failed.", hip_status);
4717+ return ret;
4718+ }
4719+
4720+ //
4721+ //---------------------------------------------------------------------------------------------
4722+ // EXISTENCE OF UTILITY FUNCTIONS
4723+ //---------------------------------------------------------------------------------------------
4724+ //
4725+
4726+ template <fft_precision prec>
4727+ void test_existence_of_utility_functions()
4728+ {
4729+ try
4730+ {
4731+ // call utility functions - they need to exist but don't need to work
4732+ const auto& hipfftw_ = hipfftw_funcs<prec>::get_instance();
4733+ hipfftw_.print_plan(nullptr);
4734+ hipfftw_.set_timelimit(0.0);
4735+ hipfftw_.cost(nullptr);
4736+ hipfftw_.flops(nullptr, nullptr, nullptr, nullptr);
4737+ hipfftw_.cleanup();
4738+ }
4739+ catch(const hipfftw_undefined_function_ptr& e)
4740+ {
4741+ GTEST_FAIL() << "Undefined function pointers detected. Error info: " << e.what();
4742+ }
4743+ catch(...)
4744+ {
4745+ GTEST_FAIL() << "Unexpected failure";
4746+ }
4747+ }
4748+
4749+ //
4750+ //---------------------------------------------------------------------------------------------
4751+ // ALLOCATION AND FREE
4752+ //---------------------------------------------------------------------------------------------
4753+ //
4754+
4755+ enum class hipfftw_alloc_func_type
4756+ {
4757+ unspecified,
4758+ real,
4759+ complex
4760+ };
4761+ bool hipfftw_alloc_func_type_is_valid(hipfftw_alloc_func_type func)
4762+ {
4763+ return func == hipfftw_alloc_func_type::unspecified || func == hipfftw_alloc_func_type::real
4764+ || func == hipfftw_alloc_func_type::complex;
4765+ }
4766+ // bit mask to prevent allocation kind(s), by increasing "rank" to enable meaningful
4767+ // comparison operators (implicitly defined based on the underlying type)
4768+ enum hipfftw_alloc_memkind : unsigned
4769+ {
4770+ none = 0x0,
4771+ pageable_host = 0x1 << 0,
4772+ pinned_host = 0x1 << 1,
4773+ any = pageable_host | pinned_host
4774+ };
4775+
4776+ const std::vector<hipfftw_alloc_memkind> hipfftw_possible_memkinds
4777+ = {pinned_host, pageable_host};
4778+
4779+ bool hipfftw_alloc_kind_is_valid(hipfftw_alloc_memkind kind)
4780+ {
4781+ return kind == (kind & hipfftw_alloc_memkind::any);
4782+ }
4783+ std::string hipfftw_alloc_kind_to_string(hipfftw_alloc_memkind kind)
4784+ {
4785+ if(!hipfftw_alloc_kind_is_valid(kind))
4786+ throw std::invalid_argument("alloc_kind_to_string: invalid kind");
4787+ if(kind == hipfftw_alloc_memkind::none)
4788+ return "none";
4789+ if(std::find(hipfftw_possible_memkinds.begin(), hipfftw_possible_memkinds.end(), kind)
4790+ == hipfftw_possible_memkinds.end())
4791+ {
4792+ // several values enabled
4793+ std::string ret;
4794+ for(auto to_consider : hipfftw_possible_memkinds)
4795+ {
4796+ if(!(kind & to_consider))
4797+ continue;
4798+ if(!ret.empty())
4799+ ret += "_or_";
4800+ ret += hipfftw_alloc_kind_to_string(to_consider);
4801+ }
4802+ return ret;
4803+ }
4804+ // kind is a well-defined value in hipfftw_possible_memkinds
4805+ switch(kind)
4806+ {
4807+ case hipfftw_alloc_memkind::pinned_host:
4808+ return "pinned_host";
4809+ break;
4810+ case hipfftw_alloc_memkind::pageable_host:
4811+ return "pageable_host";
4812+ break;
4813+ default:
4814+ throw std::runtime_error("alloc_kind_to_string: internal error encountered "
4815+ "(unexpected value for kind)");
4816+ break;
4817+ }
4818+ // unreachable
4819+ }
4820+
4821+ template <fft_precision prec>
4822+ struct hipfftw_malloc_params
4823+ {
4824+ size_t alloc_arg;
4825+ hipfftw_alloc_func_type alloc_func;
4826+ hipfftw_alloc_memkind alloc_kind;
4827+
4828+ size_t get_byte_size() const
4829+ {
4830+ return alloc_arg
4831+ * (alloc_func == hipfftw_alloc_func_type::unspecified
4832+ ? sizeof(char)
4833+ : (alloc_func == hipfftw_alloc_func_type::real
4834+ ? sizeof(hipfftw_real_t<prec>)
4835+ : sizeof(hipfftw_complex_t<prec>)));
4836+ }
4837+
4838+ std::string to_string() const
4839+ {
4840+ if(!hipfftw_alloc_func_type_is_valid(alloc_func))
4841+ throw std::runtime_error("invalid type of allocation function");
4842+ if(!hipfftw_alloc_kind_is_valid(alloc_kind))
4843+ throw std::runtime_error("invalid allocation kind(s)");
4844+
4845+ std::string ret;
4846+ if constexpr(prec == fft_precision_single)
4847+ ret += "fftwf_";
4848+ else
4849+ ret += "fftw_";
4850+
4851+ if(alloc_func == hipfftw_alloc_func_type::unspecified)
4852+ ret += "malloc_";
4853+ else if(alloc_func == hipfftw_alloc_func_type::real)
4854+ ret += "alloc_real_";
4855+ else
4856+ ret += "alloc_complex_";
4857+ ret += std::to_string(alloc_arg);
4858+ ret += "_alloc_kind_" + hipfftw_alloc_kind_to_string(alloc_kind);
4859+ return ret;
4860+ }
4861+ // for using with insert_into_unique_sorted_params
4862+ bool operator<(const hipfftw_malloc_params& other) const
4863+ {
4864+ return to_string() < other.to_string();
4865+ }
4866+ bool operator==(const hipfftw_malloc_params& other) const
4867+ {
4868+ return to_string() == other.to_string();
4869+ }
4870+ };
4871+
4872+ template <fft_precision prec>
4873+ std::vector<hipfftw_malloc_params<prec>> params_for_testing_hipfftw_malloc()
4874+ {
4875+ std::vector<hipfftw_malloc_params<prec>> ret;
4876+ // testing argument value 0 and a randomly chosen one (max 64MiB in byte size, arbitrarily chosen)
4877+ constexpr size_t max_test_alloc_size = 1ULL << 26;
4878+
4879+ const std::vector<hipfftw_alloc_func_type> func_range
4880+ = {hipfftw_alloc_func_type::unspecified,
4881+ hipfftw_alloc_func_type::real,
4882+ hipfftw_alloc_func_type::complex};
4883+ std::vector<hipfftw_alloc_memkind> memkind_range;
4884+ // add all possible combinations of memory kinds:
4885+ for(auto kind = static_cast<std::underlying_type<hipfftw_alloc_memkind>::type>(
4886+ hipfftw_alloc_memkind::none);
4887+ kind <= static_cast<std::underlying_type<hipfftw_alloc_memkind>::type>(
4888+ hipfftw_alloc_memkind::any);
4889+ kind++)
4890+ {
4891+ memkind_range.push_back(static_cast<hipfftw_alloc_memkind>(kind));
4892+ }
4893+
4894+ hipfftw_malloc_params<prec> to_add;
4895+ for(auto func : func_range)
4896+ {
4897+ size_t max_arg = max_test_alloc_size;
4898+ if(func == hipfftw_alloc_func_type::real)
4899+ max_arg /= sizeof(hipfftw_real_t<prec>);
4900+ else if(func == hipfftw_alloc_func_type::complex)
4901+ max_arg /= sizeof(hipfftw_complex_t<prec>);
4902+ std::uniform_int_distribution<size_t> arg_rng(1, max_arg);
4903+ for(auto kind : memkind_range)
4904+ {
4905+ for(auto arg : {size_t(0), arg_rng(get_pseudo_rng())})
4906+ {
4907+ to_add.alloc_arg = arg;
4908+ to_add.alloc_func = func;
4909+ to_add.alloc_kind = kind;
4910+ insert_into_unique_sorted_params(ret, to_add);
4911+ }
4912+ }
4913+ }
4914+ return ret;
4915+ }
4916+
4917+ template <fft_precision prec>
4918+ class hipfftw_allocation_test : public ::testing::TestWithParam<hipfftw_malloc_params<prec>>
4919+ {
4920+ protected:
4921+ void* test_allocation = nullptr;
4922+ bool expect_no_allocation;
4923+ std::map<hipfftw_alloc_memkind, std::unique_ptr<EnvironmentSetTemp>> temp_alloc_limit_env;
4924+
4925+ void SetUp() override
4926+ {
4927+ if(test_allocation)
4928+ GTEST_FAIL() << "Starting from an unclean slate (test_allocation is not nullptr)";
4929+ const hipfftw_malloc_params<prec>& params = this->GetParam();
4930+ // check validity of params
4931+ if(!hipfftw_alloc_kind_is_valid(params.alloc_kind))
4932+ GTEST_FAIL() << "invalid value for allocation kind";
4933+ if(!hipfftw_alloc_func_type_is_valid(params.alloc_func))
4934+ GTEST_FAIL() << "unknown allocation function";
4935+
4936+ size_t limit_for_alloc_kind = 0;
4937+ for(auto alloc_kind_candidate : hipfftw_possible_memkinds)
4938+ {
4939+ if(alloc_kind_candidate != hipfftw_alloc_memkind::pinned_host
4940+ && alloc_kind_candidate != hipfftw_alloc_memkind::pageable_host)
4941+ {
4942+ throw std::runtime_error("unexpected memory allocation kind "
4943+ + hipfftw_alloc_kind_to_string(alloc_kind_candidate));
4944+ }
4945+ const std::string control_env_var
4946+ = alloc_kind_candidate == hipfftw_alloc_memkind::pinned_host
4947+ ? "HIPFFTW_BYTE_SIZE_LIMIT_PINNED_HOST_ALLOC"
4948+ : "HIPFFTW_BYTE_SIZE_LIMIT_PAGEABLE_HOST_ALLOC";
4949+
4950+ if(alloc_kind_candidate & params.alloc_kind)
4951+ {
4952+ const auto test_user_limit = rocfft_getenv(control_env_var.c_str());
4953+ limit_for_alloc_kind
4954+ = std::max(limit_for_alloc_kind,
4955+ test_user_limit.empty() ? std::numeric_limits<size_t>::max()
4956+ : size_t(std::stoull(test_user_limit)));
4957+ }
4958+ else
4959+ {
4960+ // disable the other possible allocation kind(s) by temporarily
4961+ // setting the corresponding byte size limit to 0
4962+ temp_alloc_limit_env[alloc_kind_candidate]
4963+ = std::make_unique<EnvironmentSetTemp>(control_env_var.c_str(), "0");
4964+ // skip if temporary limit(s) was(were) not successfully set
4965+ const auto tmp_limit = rocfft_getenv(control_env_var.c_str());
4966+ if(tmp_limit.empty() || std::stoull(tmp_limit) != 0)
4967+ {
4968+ GTEST_SKIP() << "failed to set environment variable disabling "
4969+ << hipfftw_alloc_kind_to_string(alloc_kind_candidate)
4970+ << " allocation(s) by hipFFTW";
4971+ }
4972+ }
4973+ }
4974+ const size_t req_byte_size = params.get_byte_size();
4975+ expect_no_allocation = params.alloc_kind == hipfftw_alloc_memkind::none
4976+ || req_byte_size == 0 || req_byte_size > limit_for_alloc_kind;
4977+ }
4978+ void TearDown() override
4979+ {
4980+ temp_alloc_limit_env.clear();
4981+ const hipfftw_funcs<prec>& hipfftw_impl = hipfftw_funcs<prec>::get_instance();
4982+ if(test_allocation && !hipfftw_impl.free.may_be_used())
4983+ GTEST_FAIL() << "An allocation was created but it can't be freed";
4984+ // note: free should be stable even with nullptr
4985+ if(hipfftw_impl.free.may_be_used())
4986+ hipfftw_impl.free(test_allocation);
4987+ }
4988+
4989+ void test_malloc_write_and_read()
4990+ {
4991+ const hipfftw_malloc_params<prec>& params = this->GetParam();
4992+ const hipfftw_funcs<prec>& hipfftw_impl = hipfftw_funcs<prec>::get_instance();
4993+ hipfftw_exception_logger exception_logger;
4994+
4995+ struct allocation_test_to_be_skipped : std::runtime_error
4996+ {
4997+ using std::runtime_error::runtime_error;
4998+ };
4999+ struct allocation_test_failed : std::runtime_error
5000+ {
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches