%global upstreamname RCCL %global rocm_release 6.0 %global rocm_patch 2 %global rocm_version %{rocm_release}.%{rocm_patch} %global toolchain rocm # hipcc does not support some clang flags %global build_cxxflags %(echo %{optflags} | sed -e 's/-fstack-protector-strong/-Xarch_host -fstack-protector-strong/' -e 's/-fcf-protection/-Xarch_host -fcf-protection/') Name: rccl Version: %{rocm_version} Release: %autorelease Summary: ROCm Communication Collectives Library Url: https://github.com/ROCmSoftwarePlatform License: BSD-3-Clause AND MIT AND Apache-2.0 # From License.txt the main license is BSD 3 # Modifications from Microsoft is MIT # The NVIDIA based header files below are Apache-2.0 # src/include/nvtx3/nv*.h and similar # The URL for NVIDIA in the License.txt https://github.com/NVIDIA/NVTX is Apache-2.0 Source0: %{url}/%{upstreamname}/archive/rocm-%{rocm_version}.tar.gz#/%{upstreamname}-%{rocm_version}.tar.gz Patch0: 0001-prepare-rccl-cmake-for-fedora.patch BuildRequires: cmake BuildRequires: clang BuildRequires: clang-devel BuildRequires: compiler-rt BuildRequires: hipify BuildRequires: lld BuildRequires: llvm-devel BuildRequires: ninja-build BuildRequires: rocm-cmake BuildRequires: rocm-comgr-devel BuildRequires: rocm-hip-devel BuildRequires: rocm-runtime-devel BuildRequires: rocm-rpm-macros BuildRequires: rocm-rpm-macros-modules BuildRequires: rocm-smi-devel # Only x86_64 works right now: ExclusiveArch: x86_64 %description RCCL (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, and all-to-all. There is also initial support for direct GPU-to-GPU send and receive operations. It has been optimized to achieve high bandwidth on platforms using PCIe, xGMI as well as networking using InfiniBand Verbs or TCP/IP sockets. RCCL supports an arbitrary number of GPUs installed in a single node or multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. The collective operations are implemented using ring and tree algorithms and have been optimized for throughput and latency. For best performance, small operations can be either batched into larger operations or aggregated through the API. %package devel Summary: Headers and libraries for %{name} Requires: %{name}%{?_isa} = %{version}-%{release} %description devel Headers and libraries for %{name} %prep %autosetup -p1 -n %{name}-rocm-%{version} %build %cmake -G Ninja \ -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF \ -DCMAKE_INSTALL_LIBDIR=%{_libdir} \ -DROCM_SYMLINK_LIBS=OFF \ -DHIP_PLATFORM=amd %cmake_build %install %cmake_install %files %license LICENSE.txt %dir %{_libdir}/cmake/%{name} %dir %{_datadir}/%{name} %dir %{_datadir}/%{name}/msccl-algorithms %{_libdir}/lib%{name}.so.1{,.*} %exclude %{_docdir}/%{name}/LICENSE.txt %files devel %doc README.md %{_datadir}/%{name}/ %{_includedir}/%{name} %{_libdir}/cmake/%{name} %{_libdir}/lib%{name}.so %changelog %autochangelog