It's everyone's duty to squash the green behemoth.
Please be aware that the information provided in this blog post is not comprehensive and should be considered as a secondary outcome of my recent MD performance benchmarks on AMD GPUs. You may be able to easily compile and run the applications covered in this blog post using older versions of ROCm and specific GPUs. It is also possible that some of the compatibility issues mentioned in this blog post will be resolved in future software updates.
It's important to note that the content of this blog post is not included in the official documentation, manual, readme, or wiki of the corresponding applications. Moreover, finding ready-made solutions on the Internet for the issues mentioned in this post may be extremely challenging. However, this does not mean that the official documentation is not valuable. On the contrary, carefully reading and understanding the official documentation, sentence by sentence, will greatly enhance your understanding of this blog post.
- OS details: Ubuntu 22.04.3 LTS, Linux 6.2.0-26-generic x86_64, GNU 11.4.0
- GPU architectures (codenames) covered: GCN 5.1 (gfx906), RDNA 2 (gfx1030), and RDNA 3 (gfx1100)
- ROCm versions (refer to radeon.com) covered: 5.4.6, 5.5.3, and 5.6.0.
Please note that ROCm 5.4.6 is the final version bundled with LLVM 15, whereas subsequent versions of ROCm bundle LLVM 16, which is currently undergoing rapid updates. These updates may introduce compiler compatibility issues. However, if you are able to successfully compile and run the applications using the latest version of ROCm, you may potentially achieve improved performance.
With ROCm 5.4.6 and OpenSYCL-0.9.4, it is possible to compile and run GROMACS directly on gfx906 or gfx1030 GPUs. For detailed instructions, please refer to the GROMACS 2023.2 Manual, specifically pages 15-17.
When using ROCm 5.5.3 or 5.6.0 with OpenSYCL-develop 25Jul2023, you can also compile and run GROMACS. To successfully compile OpenSYCL-develop, simply add '-DWITH_SSCP_COMPILER=OFF' to the CMake command. It's worth noting that the OpenSYCL develop branch is currently undergoing changes from 'hipSYCL' to 'OpenSYCL' in the source code. As a result, when compiling GROMACS based on it, there may be some warnings in the CMake configuration logs. However, these warnings should not cause any problems.
In terms of performance, GROMACS compiled using ROCm 5.6.0 and OpenSYCL-develop 25Jul2023 exhibits significant improvements compared to previous versions. Additionally, it is free of bugs when tested on gfx906 and gfx1030 GPUs. However, when it comes to the gfx1100 (RDNA 3) GPU, operation stability is a concern across all three versions of ROCm. Specifically, performance fluctuations and a high probability of mdrun getting stuck after running for a period of time have been observed (similar feedback was reported on the GROMACS forum in June of this year). Furthermore, GPU status information cannot be recognized by rocm-smi in this case.
With ROCm 5.4.6 or 5.5.3, it is indeed possible to compile and run Amber 22 on gfx906, gfx1030, and gfx1100 GPUs without encountering any bugs during testing. However, it is important to make specific modifications to the source code as outlined below:
1) Remove the 3rd TIP-TODO in src/pmemd/src/cuda/ptxmacros.h (line 130-170).
2) When compiling for GPU architectures that do not exist on the local machine, it is necessary to add the 'AMDGPU_TARGETS' and 'GPU_TARGETS' variables to the CMake command in the 'compile_with_hip.sh' file. This will enable optimizations targeting those specific GPU architectures. Note that multiple targets can be set simultaneously.
3) For RDNA GPUs，include '-D HIP_WARP64=OFF' to the CMake command and check line 85 of src/pmemd/src/cuda/ptxmacros.h. Add additional codes as needed, such as '|| defined(__gfx1100__)' for the 7900XTX GPU.
In terms of performance, Amber 22 compiled using ROCm 5.5.3 exhibits significant improvements compared to ROCm 5.4.6. However, no successful compilation method has been found using ROCm 5.6.0.
With ROCm 5.4.6, 5.5.3, or 5.6.0, it is indeed possible to compile and run OpenMM directly on gfx906, gfx1030, and gfx1100 GPUs. During testing, occasional instances of GPU scheduling inactivity were observed on the gfx1100 GPU, while no bugs were encountered in the rest of the tests.
When compiling for GPU architectures that are not present on the local machine, it is necessary to include the 'AMDGPU_TARGETS' and 'GPU_TARGETS' variables in the CMake command. This will enable optimizations targeting those specific GPU architectures. Note that multiple targets can be set simultaneously.
In terms of performance, there is a sequential increase in the performance of OpenMM when compiled using the three versions of ROCm (with the default VkFFT backend). The performance difference becomes more apparent for smaller systems.
With ROCm 5.4.6, 5.5.3, or 5.6.0, it is indeed possible to compile and run LAMMPS on gfx906, gfx1030, and gfx1100 GPUs without encountering any bugs during testing.
For RDNA GPUs, certain modifications are required. Specifically, you need to replace the official bundle of Kokkos (lib/Kokkos) with the latest version of Kokkos (4.1.0). Additionally, you need to change "14" to "17" in lines 146 and 147 of cmake/CMakeLists.txt.
During the CMake configuration step, you need to specify the necessary packages in cmake/presets/basic.cmake. Furthermore, you should specify the GPU architecture code in cmake/presets/kokkos-hip.cmake, keeping in mind that only one GPU architecture can be specified. For a list of code mappings, refer to lib/kokkos/cmake/kokkos_arch.cmake.
In terms of performance, there is a sequential increase in the performance of LAMMPS when compiled using the three versions of ROCm. The performance difference becomes more apparent for smaller systems.
The compatibility lists of the four Apps are as follows:
In terms of absolute performance, cost-effectiveness, and compatibility, both OpenMM and LAMMPS are currently suitable for ordinary users to fully "switch to AMD". Additionally, GROMACS and Amber users can also start to try to "switch to AMD". ROCm 5.4.6 or 5.5.3, coupled with a GCN 5.1 or RDNA 2 GPU, offers seamless compatibility with all four applications mentioned in this blog post, while ROCm 5.6.0 provides the best performance. Additionally, in some cases, minor modifications to the source code of the applications may be necessary.
It is worth mentioning that as of the time of this blog post, the latest version of ROCm (5.6.0) does not officially support RDNA 3 GPUs. According to AMD's official notification, RDNA 3 GPUs will receive official support sometime this fall. Therefore, any specific issues related to RDNA 3 in GROMACS may only be temporary and are expected to be resolved in the future (I hope so).