summaryrefslogtreecommitdiff
path: root/eclass
diff options
context:
space:
mode:
authorYiyang Wu <xgreenlandforwyy@gmail.com>2024-04-08 14:10:07 +0800
committerBenda Xu <heroxbd@gentoo.org>2024-05-18 11:45:39 +0800
commitbd29b1782a348f0017d74a92204acd6f2704c96c (patch)
tree17fcb3c5006e4fb45e9903671dca9da72ac57d67 /eclass
parentapp-emacs/org-mode: Stabilize 9.6.26 ALLARCHES, #932121 (diff)
downloadgentoo-bd29b1782a348f0017d74a92204acd6f2704c96c.tar.gz
gentoo-bd29b1782a348f0017d74a92204acd6f2704c96c.tar.bz2
gentoo-bd29b1782a348f0017d74a92204acd6f2704c96c.zip
rocm.eclass: remove xnack flag for broader compatibility
Initially, rocm.eclass append xnack[1,2] feature flag to gfx9 GPUs, since ROCm upstream does this in many of their math libraries, e.g. rocBLAS [3]. The list includes gfx90a:xnack+, indicating xnack is usable for MI200 series, thus rocm.eclass append :xnack+ to gfx90a. But it turns out xnack- is also common for MI200 series, restricting to xnack+ produces incompatible GPU kernel with xnack- mode. Also, community also explores using xnack on other gfx9 GPU [4,5], which is previously restricted to xnack- in rocm.eclass. By not appending xnack feature flag, GPU kernels are compiled to "xnack any" mode, which can be run in either mode, potentially scarifying some performance [6,7], with no direct evidence. rocFFT reports no performance penalty[8]. For the reason above, do not append xnack feature flag to AMDGPU_TARGETS, which is compatible with GPUs operate in both xnack mode. [1] https://wiki.gentoo.org/wiki/ROCm#XNACK_target_feature [2] https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html#xnack [3] https://github.com/ROCm/rocBLAS/blob/release/rocm-rel-5.0/CMakeLists.txt#L201 [4] https://niconiconi.neocities.org/tech-notes/xnack-on-amd-gpus/ [5] https://arxiv.org/abs/2401.02680 [6] https://llvm.org/docs/AMDGPUUsage.html#target-features [7] https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes [8] https://github.com/ROCm/rocFFT/commit/cd2689360ba3b3579d044d8925838ff307b4b4cf Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com> Signed-off-by: Benda Xu <heroxbd@gentoo.org> Closes: https://github.com/gentoo/gentoo/pull/36254
Diffstat (limited to 'eclass')
-rw-r--r--eclass/rocm.eclass19
1 files changed, 2 insertions, 17 deletions
diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass
index 9804ecde97d0..e03e8bdd507a 100644
--- a/eclass/rocm.eclass
+++ b/eclass/rocm.eclass
@@ -1,4 +1,4 @@
-# Copyright 2022-2023 Gentoo Authors
+# Copyright 2022-2024 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
# @ECLASS: rocm.eclass
@@ -201,22 +201,7 @@ unset -f _rocm_set_globals
# Append default target feature to GPU arch. See
# https://llvm.org/docs/AMDGPUUsage.html#target-features
get_amdgpu_flags() {
- local amdgpu_target_flags
- for gpu_target in ${AMDGPU_TARGETS}; do
- local target_feature=
- case ${gpu_target} in
- gfx906|gfx908)
- target_feature=:xnack-
- ;;
- gfx90a)
- target_feature=:xnack+
- ;;
- *)
- ;;
- esac
- amdgpu_target_flags+="${gpu_target}${target_feature};"
- done
- echo "${amdgpu_target_flags}"
+ echo $(printf "%s;" ${AMDGPU_TARGETS[@]})
}
# @FUNCTION: check_amdgpu