Warning: Permanently added '3.87.35.131' (ED25519) to the list of known hosts. You can reproduce this build on your computer by running: sudo dnf install copr-rpmbuild /usr/bin/copr-rpmbuild --verbose --drop-resultdir --task-url https://copr.fedorainfracloud.org/backend/get-build-task/7299609-fedora-39-aarch64 --chroot fedora-39-aarch64 Version: 0.72 PID: 9007 Logging PID: 9008 Task: {'allow_user_ssh': False, 'appstream': False, 'background': False, 'build_id': 7299609, 'buildroot_pkgs': [], 'chroot': 'fedora-39-aarch64', 'enable_net': True, 'fedora_review': False, 'git_hash': 'dd083b26b8c3f37fe5cccca0143b784f0db535d4', 'git_repo': 'https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch', 'isolation': 'default', 'memory_reqs': 2048, 'package_name': 'pytorch', 'package_version': '2.4.0-20240412.0.git7efaf54d.cu12_3', 'project_dirname': 'ML', 'project_name': 'ML', 'project_owner': 'rezso', 'repo_priority': None, 'repos': [{'baseurl': 'https://download.copr.fedorainfracloud.org/results/rezso/ML/fedora-39-aarch64/', 'id': 'copr_base', 'name': 'Copr repository', 'priority': None}, {'baseurl': 'https://download.copr.fedorainfracloud.org/results/rezso/CUDA/fedora-39-aarch64/', 'id': 'copr_rezso_CUDA', 'name': 'Additional repo copr_rezso_CUDA'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/ppc64le', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_ppc64le', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_ppc64le'}], 'sandbox': 'rezso/ML--rezso', 'source_json': {}, 'source_type': None, 'ssh_public_keys': None, 'submitter': 'rezso', 'tags': [], 'task_id': '7299609-fedora-39-aarch64', 'timeout': 172800, 'uses_devel_repo': False, 'with_opts': [], 'without_opts': []} Running: git clone https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch /var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch --depth 500 --no-single-branch --recursive cmd: ['git', 'clone', 'https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch', '/var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch', '--depth', '500', '--no-single-branch', '--recursive'] cwd: . rc: 0 stdout: stderr: Cloning into '/var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch'... Running: git checkout dd083b26b8c3f37fe5cccca0143b784f0db535d4 -- cmd: ['git', 'checkout', 'dd083b26b8c3f37fe5cccca0143b784f0db535d4', '--'] cwd: /var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch rc: 0 stdout: stderr: Note: switching to 'dd083b26b8c3f37fe5cccca0143b784f0db535d4'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at dd083b2 automatic import of pytorch Running: copr-distgit-client sources cmd: ['copr-distgit-client', 'sources'] cwd: /var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch rc: 0 stdout: stderr: INFO: Reading stdout from command: git rev-parse --abbrev-ref HEAD INFO: Reading stdout from command: git rev-parse HEAD INFO: Reading sources specification file: sources /usr/bin/tail: /var/lib/copr-rpmbuild/main.log: file truncated Running (timeout=172800): unbuffer mock --spec /var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch/pytorch.spec --sources /var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch --resultdir /var/lib/copr-rpmbuild/results --uniqueext 1712885724.178146 -r /var/lib/copr-rpmbuild/results/configs/child.cfg INFO: mock.py version 5.5 starting (python version = 3.12.1, NVR = mock-5.5-1.fc39), args: /usr/libexec/mock/mock --spec /var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch/pytorch.spec --sources /var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch --resultdir /var/lib/copr-rpmbuild/results --uniqueext 1712885724.178146 -r /var/lib/copr-rpmbuild/results/configs/child.cfg Start(bootstrap): init plugins INFO: tmpfs initialized INFO: selinux enabled INFO: chroot_scan: initialized INFO: compress_logs: initialized Finish(bootstrap): init plugins Start: init plugins INFO: tmpfs initialized INFO: selinux enabled INFO: chroot_scan: initialized INFO: compress_logs: initialized Finish: init plugins INFO: Signal handler active Start: run INFO: Start(/var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch/pytorch.spec) Config(fedora-39-aarch64) Start: clean chroot Finish: clean chroot Mock Version: 5.5 INFO: Mock Version: 5.5 Start(bootstrap): chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-39-aarch64-bootstrap-1712885724.178146/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start(bootstrap): cleaning package manager metadata Finish(bootstrap): cleaning package manager metadata INFO: Guessed host environment type: unknown INFO: Using bootstrap image: registry.fedoraproject.org/fedora:39 INFO: Pulling image: registry.fedoraproject.org/fedora:39 INFO: Copy content of container registry.fedoraproject.org/fedora:39 to /var/lib/mock/fedora-39-aarch64-bootstrap-1712885724.178146/root INFO: Checking that registry.fedoraproject.org/fedora:39 image matches host's architecture INFO: mounting registry.fedoraproject.org/fedora:39 with podman image mount INFO: image registry.fedoraproject.org/fedora:39 as /var/lib/containers/storage/overlay/6c0d01dbfbcdbe058c454adffacaef5e1d93022daaf574d94ec38917733fcb5a/merged INFO: umounting image registry.fedoraproject.org/fedora:39 (/var/lib/containers/storage/overlay/6c0d01dbfbcdbe058c454adffacaef5e1d93022daaf574d94ec38917733fcb5a/merged) with podman image umount INFO: Package manager dnf detected and used (fallback) INFO: Bootstrap image not marked ready Start(bootstrap): installing dnf tooling No matches found for the following disable plugin patterns: local, spacewalk, versionlock Copr repository 22 MB/s | 1.1 MB 00:00 Additional repo copr_rezso_CUDA 1.3 MB/s | 61 kB 00:00 Additional repo http_developer_download_nvidia_ 123 MB/s | 3.3 MB 00:00 Additional repo http_developer_download_nvidia_ 27 MB/s | 2.0 MB 00:00 Additional repo http_developer_download_nvidia_ 86 MB/s | 1.8 MB 00:00 fedora 55 MB/s | 86 MB 00:01 updates 48 MB/s | 33 MB 00:00 Package python3-dnf-4.19.2-1.fc39.noarch is already installed. Dependencies resolved. ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: python3-dnf-plugins-core noarch 4.6.0-1.fc39 updates 317 k Installing dependencies: dbus-libs aarch64 1:1.14.10-1.fc39 fedora 156 k python3-dateutil noarch 1:2.8.2-10.fc39 fedora 355 k python3-dbus aarch64 1.3.2-4.fc39 fedora 157 k python3-distro noarch 1.8.0-6.fc39 fedora 49 k python3-six noarch 1.16.0-12.fc39 fedora 41 k python3-systemd aarch64 235-5.fc39 fedora 107 k Transaction Summary ================================================================================ Install 7 Packages Total download size: 1.2 M Installed size: 4.7 M Downloading Packages: (1/7): dbus-libs-1.14.10-1.fc39.aarch64.rpm 6.3 MB/s | 156 kB 00:00 (2/7): python3-dateutil-2.8.2-10.fc39.noarch.rp 14 MB/s | 355 kB 00:00 (3/7): python3-dbus-1.3.2-4.fc39.aarch64.rpm 5.8 MB/s | 157 kB 00:00 (4/7): python3-distro-1.8.0-6.fc39.noarch.rpm 16 MB/s | 49 kB 00:00 (5/7): python3-six-1.16.0-12.fc39.noarch.rpm 14 MB/s | 41 kB 00:00 (6/7): python3-systemd-235-5.fc39.aarch64.rpm 33 MB/s | 107 kB 00:00 (7/7): python3-dnf-plugins-core-4.6.0-1.fc39.no 78 MB/s | 317 kB 00:00 -------------------------------------------------------------------------------- Total 2.1 MB/s | 1.2 MB 00:00 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : python3-systemd-235-5.fc39.aarch64 1/7 Installing : python3-six-1.16.0-12.fc39.noarch 2/7 Installing : python3-dateutil-1:2.8.2-10.fc39.noarch 3/7 Installing : python3-distro-1.8.0-6.fc39.noarch 4/7 Installing : dbus-libs-1:1.14.10-1.fc39.aarch64 5/7 Installing : python3-dbus-1.3.2-4.fc39.aarch64 6/7 Installing : python3-dnf-plugins-core-4.6.0-1.fc39.noarch 7/7 Running scriptlet: python3-dnf-plugins-core-4.6.0-1.fc39.noarch 7/7 Verifying : dbus-libs-1:1.14.10-1.fc39.aarch64 1/7 Verifying : python3-dateutil-1:2.8.2-10.fc39.noarch 2/7 Verifying : python3-dbus-1.3.2-4.fc39.aarch64 3/7 Verifying : python3-distro-1.8.0-6.fc39.noarch 4/7 Verifying : python3-six-1.16.0-12.fc39.noarch 5/7 Verifying : python3-systemd-235-5.fc39.aarch64 6/7 Verifying : python3-dnf-plugins-core-4.6.0-1.fc39.noarch 7/7 Installed: dbus-libs-1:1.14.10-1.fc39.aarch64 python3-dateutil-1:2.8.2-10.fc39.noarch python3-dbus-1.3.2-4.fc39.aarch64 python3-distro-1.8.0-6.fc39.noarch python3-dnf-plugins-core-4.6.0-1.fc39.noarch python3-six-1.16.0-12.fc39.noarch python3-systemd-235-5.fc39.aarch64 Complete! Finish(bootstrap): installing dnf tooling Start(bootstrap): creating root cache Finish(bootstrap): creating root cache Finish(bootstrap): chroot init Start: chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-39-aarch64-1712885724.178146/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start: cleaning package manager metadata Finish: cleaning package manager metadata INFO: enabled HW Info plugin INFO: Package manager dnf detected and used (direct choice) INFO: Buildroot is handled by package management downloaded with a bootstrap image: rpm-4.19.1.1-1.fc39.aarch64 rpm-sequoia-1.6.0-1.fc39.aarch64 python3-dnf-4.19.2-1.fc39.noarch python3-dnf-plugins-core-4.6.0-1.fc39.noarch yum-4.19.2-1.fc39.noarch Start: installing minimal buildroot with dnf No matches found for the following disable plugin patterns: local, spacewalk, versionlock Copr repository 6.4 MB/s | 1.1 MB 00:00 Additional repo copr_rezso_CUDA 1.2 MB/s | 61 kB 00:00 Additional repo http_developer_download_nvidia_ 108 MB/s | 3.3 MB 00:00 Additional repo http_developer_download_nvidia_ 71 MB/s | 2.0 MB 00:00 Additional repo http_developer_download_nvidia_ 98 MB/s | 1.8 MB 00:00 fedora 50 MB/s | 86 MB 00:01 updates 34 MB/s | 33 MB 00:00 Dependencies resolved. ================================================================================ Package Arch Version Repo Size ================================================================================ Installing group/module packages: bash aarch64 5.2.26-1.fc39 updates 1.8 M bzip2 aarch64 1.0.8-16.fc39 fedora 52 k coreutils aarch64 9.3-5.fc39 updates 1.2 M cpio aarch64 2.14-4.fc39 fedora 277 k diffutils aarch64 3.10-3.fc39 fedora 396 k fedora-release-common noarch 39-36 updates 19 k findutils aarch64 1:4.9.0-5.fc39 fedora 495 k gawk aarch64 5.2.2-2.fc39 fedora 1.1 M glibc-minimal-langpack aarch64 2.38-99.fc39 copr_base 67 k grep aarch64 3.11-3.fc39 fedora 295 k gzip aarch64 1.12-6.fc39 fedora 164 k info aarch64 7.0.3-3.fc39 fedora 179 k patch aarch64 2.7.6-22.fc39 fedora 123 k redhat-rpm-config noarch 266-1.fc39 updates 78 k rpm-build aarch64 4.19.1.1-1.fc39 updates 79 k sed aarch64 4.8-14.fc39 fedora 304 k shadow-utils aarch64 2:4.14.0-2.fc39 updates 1.3 M tar aarch64 2:1.35-2.fc39 fedora 854 k unzip aarch64 6.0-62.fc39 fedora 183 k util-linux aarch64 2.39.4-1.fc39 updates 1.2 M which aarch64 2.21-40.fc39 fedora 42 k xz aarch64 5.4.4-1.fc39 fedora 556 k Installing dependencies: alternatives aarch64 1.26-1.fc39 updates 38 k ansible-srpm-macros noarch 1-12.fc39 updates 21 k audit-libs aarch64 3.1.2-8.fc39 updates 118 k authselect aarch64 1.4.3-1.fc39 fedora 150 k authselect-libs aarch64 1.4.3-1.fc39 fedora 249 k basesystem noarch 11-18.fc39 fedora 7.2 k binutils aarch64 2.40-14.fc39 updates 6.1 M binutils-gold aarch64 2.40-14.fc39 updates 945 k bzip2-libs aarch64 1.0.8-16.fc39 fedora 43 k ca-certificates noarch 2023.2.60_v7.0.306-2.fc39 fedora 837 k coreutils-common aarch64 9.3-5.fc39 updates 2.1 M cracklib aarch64 2.9.11-2.fc39 fedora 94 k crypto-policies noarch 20231204-1.git1e3a2e4.fc39 updates 100 k curl aarch64 8.2.1-4.fc39 updates 341 k cyrus-sasl-lib aarch64 2.1.28-11.fc39 fedora 781 k debugedit aarch64 5.0-12.fc39 updates 78 k dwz aarch64 0.15-3.fc39 fedora 136 k ed aarch64 1.19-4.fc39 fedora 78 k efi-srpm-macros noarch 5-9.fc39 fedora 22 k elfutils aarch64 0.191-2.fc39 updates 560 k elfutils-debuginfod-client aarch64 0.191-2.fc39 updates 38 k elfutils-default-yama-scope noarch 0.191-2.fc39 updates 13 k elfutils-libelf aarch64 0.191-2.fc39 updates 209 k elfutils-libs aarch64 0.191-2.fc39 updates 263 k fedora-gpg-keys noarch 39-1 fedora 130 k fedora-release noarch 39-36 updates 8.6 k fedora-release-identity-basic noarch 39-36 updates 9.4 k fedora-repos noarch 39-1 fedora 9.3 k file aarch64 5.44-5.fc39 fedora 49 k file-libs aarch64 5.44-5.fc39 fedora 729 k filesystem aarch64 3.18-6.fc39 fedora 1.1 M fonts-srpm-macros noarch 1:2.0.5-12.fc39 fedora 26 k forge-srpm-macros noarch 0.2.0-3.fc39 updates 19 k fpc-srpm-macros noarch 1.3-8.fc39 fedora 7.4 k gdb-minimal aarch64 14.2-1.fc39 updates 3.9 M gdbm-libs aarch64 1:1.23-4.fc39 fedora 56 k ghc-srpm-macros noarch 1.6.1-2.fc39 fedora 7.8 k glibc aarch64 2.38-99.fc39 copr_base 1.7 M glibc-common aarch64 2.38-99.fc39 copr_base 338 k glibc-gconv-extra aarch64 2.38-99.fc39 copr_base 1.9 M gmp aarch64 1:6.2.1-5.fc39 fedora 266 k gnat-srpm-macros noarch 6-3.fc39 fedora 8.8 k go-srpm-macros noarch 3.5.0-1.fc39 updates 28 k jansson aarch64 2.13.1-7.fc39 fedora 46 k kernel-srpm-macros noarch 1.0-20.fc39 fedora 10 k keyutils-libs aarch64 1.6.3-1.fc39 updates 32 k krb5-libs aarch64 1.21.2-3.fc39 updates 770 k libacl aarch64 2.3.1-9.fc39 updates 24 k libarchive aarch64 3.7.1-1.fc39 fedora 402 k libattr aarch64 2.5.1-8.fc39 fedora 18 k libblkid aarch64 2.39.4-1.fc39 updates 116 k libbrotli aarch64 1.1.0-1.fc39 fedora 345 k libcap aarch64 2.48-9.fc39 updates 69 k libcap-ng aarch64 0.8.3-8.fc39 fedora 32 k libcom_err aarch64 1.47.0-2.fc39 fedora 26 k libcurl aarch64 8.2.1-4.fc39 updates 317 k libdb aarch64 5.3.28-56.fc39 fedora 735 k libeconf aarch64 0.5.2-2.fc39 updates 30 k libevent aarch64 2.1.12-9.fc39 fedora 254 k libfdisk aarch64 2.39.4-1.fc39 updates 157 k libffi aarch64 3.4.4-4.fc39 fedora 38 k libgcc aarch64 13.2.1-7.fc39 updates 99 k libgomp aarch64 13.2.1-7.fc39 updates 316 k libidn2 aarch64 2.3.7-1.fc39 updates 120 k libmount aarch64 2.39.4-1.fc39 updates 153 k libnghttp2 aarch64 1.55.1-4.fc39 updates 76 k libnsl2 aarch64 2.0.0-6.fc39 fedora 30 k libpkgconf aarch64 1.9.5-2.fc39 fedora 38 k libpsl aarch64 0.21.2-4.fc39 fedora 63 k libpwquality aarch64 1.4.5-6.fc39 fedora 120 k libselinux aarch64 3.5-5.fc39 fedora 86 k libsemanage aarch64 3.5-4.fc39 fedora 117 k libsepol aarch64 3.5-2.fc39 fedora 311 k libsigsegv aarch64 2.14-5.fc39 fedora 27 k libsmartcols aarch64 2.39.4-1.fc39 updates 65 k libssh aarch64 0.10.6-2.fc39 updates 213 k libssh-config noarch 0.10.6-2.fc39 updates 9.0 k libstdc++ aarch64 13.2.1-7.fc39 updates 818 k libtasn1 aarch64 4.19.0-3.fc39 fedora 73 k libtirpc aarch64 1.3.4-1.rc3.fc39 updates 94 k libunistring aarch64 1.1-5.fc39 fedora 540 k libutempter aarch64 1.2.1-10.fc39 fedora 27 k libuuid aarch64 2.39.4-1.fc39 updates 28 k libverto aarch64 0.3.2-6.fc39 fedora 21 k libxcrypt aarch64 4.4.36-2.fc39 fedora 123 k libxml2 aarch64 2.10.4-3.fc39 fedora 689 k libzstd aarch64 1.5.6-1.fc39 updates 284 k lua-libs aarch64 5.4.6-3.fc39 fedora 131 k lua-srpm-macros noarch 1-13.fc39 updates 8.7 k lz4-libs aarch64 1.9.4-4.fc39 fedora 68 k mpfr aarch64 4.2.0-3.fc39 fedora 319 k ncurses-base noarch 6.4-7.20230520.fc39.1 updates 88 k ncurses-libs aarch64 6.4-7.20230520.fc39.1 updates 326 k ocaml-srpm-macros noarch 8-2.fc39 fedora 14 k openblas-srpm-macros noarch 2-14.fc39 fedora 7.5 k openldap aarch64 2.6.6-1.fc39 fedora 251 k openssl-libs aarch64 1:3.1.1-4.fc39 fedora 2.0 M p11-kit aarch64 0.25.3-1.fc39 updates 495 k p11-kit-trust aarch64 0.25.3-1.fc39 updates 141 k package-notes-srpm-macros noarch 0.5-9.fc39 fedora 11 k pam aarch64 1.5.3-3.fc39 updates 552 k pam-libs aarch64 1.5.3-3.fc39 updates 57 k pcre2 aarch64 10.42-1.fc39.2 fedora 219 k pcre2-syntax noarch 10.42-1.fc39.2 fedora 143 k perl-srpm-macros noarch 1-51.fc39 fedora 8.0 k pkgconf aarch64 1.9.5-2.fc39 fedora 42 k pkgconf-m4 noarch 1.9.5-2.fc39 fedora 14 k pkgconf-pkg-config aarch64 1.9.5-2.fc39 fedora 9.6 k popt aarch64 1.19-3.fc39 fedora 66 k publicsuffix-list-dafsa noarch 20240107-1.fc39 updates 58 k pyproject-srpm-macros noarch 1.12.0-1.fc39 updates 14 k python-srpm-macros noarch 3.12-4.fc39 fedora 25 k qt5-srpm-macros noarch 5.15.12-1.fc39 updates 8.4 k qt6-srpm-macros noarch 6.6.2-1.fc39 updates 8.9 k readline aarch64 8.2-6.fc39 updates 212 k rpm aarch64 4.19.1.1-1.fc39 updates 536 k rpm-build-libs aarch64 4.19.1.1-1.fc39 updates 91 k rpm-libs aarch64 4.19.1.1-1.fc39 updates 305 k rpm-sequoia aarch64 1.6.0-1.fc39 updates 817 k rpmautospec-rpm-macros noarch 0.6.3-1.fc39 updates 10 k rust-srpm-macros noarch 26.2-1.fc39 updates 13 k setup noarch 2.14.4-1.fc39 fedora 154 k sqlite-libs aarch64 3.42.0-7.fc39 fedora 677 k systemd-libs aarch64 254.10-1.fc39 updates 665 k util-linux-core aarch64 2.39.4-1.fc39 updates 505 k xxhash-libs aarch64 0.8.2-1.fc39 fedora 35 k xz-libs aarch64 5.4.4-1.fc39 fedora 106 k zip aarch64 3.0-39.fc39 fedora 262 k zlib aarch64 1.2.13-4.fc39 fedora 93 k zstd aarch64 1.5.6-1.fc39 updates 445 k Installing Groups: Buildsystem building group Transaction Summary ================================================================================ Install 152 Packages Total download size: 52 M Installed size: 302 M Downloading Packages: (1/152): glibc-common-2.38-99.fc39.aarch64.rpm 14 MB/s | 338 kB 00:00 (2/152): glibc-gconv-extra-2.38-99.fc39.aarch64 62 MB/s | 1.9 MB 00:00 (3/152): glibc-2.38-99.fc39.aarch64.rpm 49 MB/s | 1.7 MB 00:00 (4/152): glibc-minimal-langpack-2.38-99.fc39.aa 6.4 MB/s | 67 kB 00:00 (5/152): basesystem-11-18.fc39.noarch.rpm 762 kB/s | 7.2 kB 00:00 (6/152): authselect-1.4.3-1.fc39.aarch64.rpm 11 MB/s | 150 kB 00:00 (7/152): bzip2-1.0.8-16.fc39.aarch64.rpm 26 MB/s | 52 kB 00:00 (8/152): bzip2-libs-1.0.8-16.fc39.aarch64.rpm 19 MB/s | 43 kB 00:00 (9/152): authselect-libs-1.4.3-1.fc39.aarch64.r 15 MB/s | 249 kB 00:00 (10/152): ca-certificates-2023.2.60_v7.0.306-2. 128 MB/s | 837 kB 00:00 (11/152): cpio-2.14-4.fc39.aarch64.rpm 45 MB/s | 277 kB 00:00 (12/152): cracklib-2.9.11-2.fc39.aarch64.rpm 38 MB/s | 94 kB 00:00 (13/152): cyrus-sasl-lib-2.1.28-11.fc39.aarch64 221 MB/s | 781 kB 00:00 (14/152): dwz-0.15-3.fc39.aarch64.rpm 45 MB/s | 136 kB 00:00 (15/152): diffutils-3.10-3.fc39.aarch64.rpm 84 MB/s | 396 kB 00:00 (16/152): ed-1.19-4.fc39.aarch64.rpm 37 MB/s | 78 kB 00:00 (17/152): efi-srpm-macros-5-9.fc39.noarch.rpm 13 MB/s | 22 kB 00:00 (18/152): fedora-gpg-keys-39-1.noarch.rpm 47 MB/s | 130 kB 00:00 (19/152): fedora-repos-39-1.noarch.rpm 2.7 MB/s | 9.3 kB 00:00 (20/152): file-5.44-5.fc39.aarch64.rpm 15 MB/s | 49 kB 00:00 (21/152): file-libs-5.44-5.fc39.aarch64.rpm 73 MB/s | 729 kB 00:00 (22/152): filesystem-3.18-6.fc39.aarch64.rpm 105 MB/s | 1.1 MB 00:00 (23/152): findutils-4.9.0-5.fc39.aarch64.rpm 43 MB/s | 495 kB 00:00 (24/152): fonts-srpm-macros-2.0.5-12.fc39.noarc 3.5 MB/s | 26 kB 00:00 (25/152): fpc-srpm-macros-1.3-8.fc39.noarch.rpm 1.2 MB/s | 7.4 kB 00:00 (26/152): gawk-5.2.2-2.fc39.aarch64.rpm 107 MB/s | 1.1 MB 00:00 (27/152): gdbm-libs-1.23-4.fc39.aarch64.rpm 9.9 MB/s | 56 kB 00:00 (28/152): ghc-srpm-macros-1.6.1-2.fc39.noarch.r 1.3 MB/s | 7.8 kB 00:00 (29/152): gmp-6.2.1-5.fc39.aarch64.rpm 32 MB/s | 266 kB 00:00 (30/152): grep-3.11-3.fc39.aarch64.rpm 35 MB/s | 295 kB 00:00 (31/152): gnat-srpm-macros-6-3.fc39.noarch.rpm 967 kB/s | 8.8 kB 00:00 (32/152): info-7.0.3-3.fc39.aarch64.rpm 46 MB/s | 179 kB 00:00 (33/152): gzip-1.12-6.fc39.aarch64.rpm 29 MB/s | 164 kB 00:00 (34/152): jansson-2.13.1-7.fc39.aarch64.rpm 6.5 MB/s | 46 kB 00:00 (35/152): kernel-srpm-macros-1.0-20.fc39.noarch 1.9 MB/s | 10 kB 00:00 (36/152): libarchive-3.7.1-1.fc39.aarch64.rpm 68 MB/s | 402 kB 00:00 (37/152): libbrotli-1.1.0-1.fc39.aarch64.rpm 57 MB/s | 345 kB 00:00 (38/152): libattr-2.5.1-8.fc39.aarch64.rpm 2.0 MB/s | 18 kB 00:00 (39/152): libcap-ng-0.8.3-8.fc39.aarch64.rpm 2.4 MB/s | 32 kB 00:00 (40/152): libcom_err-1.47.0-2.fc39.aarch64.rpm 2.1 MB/s | 26 kB 00:00 (41/152): libevent-2.1.12-9.fc39.aarch64.rpm 50 MB/s | 254 kB 00:00 (42/152): libdb-5.3.28-56.fc39.aarch64.rpm 48 MB/s | 735 kB 00:00 (43/152): libffi-3.4.4-4.fc39.aarch64.rpm 6.3 MB/s | 38 kB 00:00 (44/152): libnsl2-2.0.0-6.fc39.aarch64.rpm 5.6 MB/s | 30 kB 00:00 (45/152): libpkgconf-1.9.5-2.fc39.aarch64.rpm 5.2 MB/s | 38 kB 00:00 (46/152): libpsl-0.21.2-4.fc39.aarch64.rpm 15 MB/s | 63 kB 00:00 (47/152): libpwquality-1.4.5-6.fc39.aarch64.rpm 19 MB/s | 120 kB 00:00 (48/152): libselinux-3.5-5.fc39.aarch64.rpm 13 MB/s | 86 kB 00:00 (49/152): libsemanage-3.5-4.fc39.aarch64.rpm 17 MB/s | 117 kB 00:00 (50/152): libsepol-3.5-2.fc39.aarch64.rpm 37 MB/s | 311 kB 00:00 (51/152): libsigsegv-2.14-5.fc39.aarch64.rpm 3.0 MB/s | 27 kB 00:00 (52/152): libtasn1-4.19.0-3.fc39.aarch64.rpm 7.9 MB/s | 73 kB 00:00 (53/152): libunistring-1.1-5.fc39.aarch64.rpm 82 MB/s | 540 kB 00:00 (54/152): libutempter-1.2.1-10.fc39.aarch64.rpm 1.8 MB/s | 27 kB 00:00 (55/152): libxcrypt-4.4.36-2.fc39.aarch64.rpm 9.7 MB/s | 123 kB 00:00 (56/152): libverto-0.3.2-6.fc39.aarch64.rpm 1.3 MB/s | 21 kB 00:00 (57/152): lua-libs-5.4.6-3.fc39.aarch64.rpm 16 MB/s | 131 kB 00:00 (58/152): lz4-libs-1.9.4-4.fc39.aarch64.rpm 7.0 MB/s | 68 kB 00:00 (59/152): libxml2-2.10.4-3.fc39.aarch64.rpm 53 MB/s | 689 kB 00:00 (60/152): mpfr-4.2.0-3.fc39.aarch64.rpm 56 MB/s | 319 kB 00:00 (61/152): ocaml-srpm-macros-8-2.fc39.noarch.rpm 3.3 MB/s | 14 kB 00:00 (62/152): openblas-srpm-macros-2-14.fc39.noarch 2.9 MB/s | 7.5 kB 00:00 (63/152): openldap-2.6.6-1.fc39.aarch64.rpm 94 MB/s | 251 kB 00:00 (64/152): openssl-libs-3.1.1-4.fc39.aarch64.rpm 248 MB/s | 2.0 MB 00:00 (65/152): package-notes-srpm-macros-0.5-9.fc39. 1.3 MB/s | 11 kB 00:00 (66/152): patch-2.7.6-22.fc39.aarch64.rpm 12 MB/s | 123 kB 00:00 (67/152): pcre2-10.42-1.fc39.2.aarch64.rpm 53 MB/s | 219 kB 00:00 (68/152): pcre2-syntax-10.42-1.fc39.2.noarch.rp 33 MB/s | 143 kB 00:00 (69/152): perl-srpm-macros-1-51.fc39.noarch.rpm 2.3 MB/s | 8.0 kB 00:00 (70/152): pkgconf-1.9.5-2.fc39.aarch64.rpm 11 MB/s | 42 kB 00:00 (71/152): pkgconf-m4-1.9.5-2.fc39.noarch.rpm 3.5 MB/s | 14 kB 00:00 (72/152): pkgconf-pkg-config-1.9.5-2.fc39.aarch 5.9 MB/s | 9.6 kB 00:00 (73/152): popt-1.19-3.fc39.aarch64.rpm 27 MB/s | 66 kB 00:00 (74/152): python-srpm-macros-3.12-4.fc39.noarch 8.2 MB/s | 25 kB 00:00 (75/152): sed-4.8-14.fc39.aarch64.rpm 75 MB/s | 304 kB 00:00 (76/152): setup-2.14.4-1.fc39.noarch.rpm 38 MB/s | 154 kB 00:00 (77/152): sqlite-libs-3.42.0-7.fc39.aarch64.rpm 101 MB/s | 677 kB 00:00 (78/152): tar-1.35-2.fc39.aarch64.rpm 113 MB/s | 854 kB 00:00 (79/152): unzip-6.0-62.fc39.aarch64.rpm 24 MB/s | 183 kB 00:00 (80/152): which-2.21-40.fc39.aarch64.rpm 9.9 MB/s | 42 kB 00:00 (81/152): xxhash-libs-0.8.2-1.fc39.aarch64.rpm 11 MB/s | 35 kB 00:00 (82/152): xz-5.4.4-1.fc39.aarch64.rpm 115 MB/s | 556 kB 00:00 (83/152): xz-libs-5.4.4-1.fc39.aarch64.rpm 20 MB/s | 106 kB 00:00 (84/152): zip-3.0-39.fc39.aarch64.rpm 42 MB/s | 262 kB 00:00 (85/152): zlib-1.2.13-4.fc39.aarch64.rpm 16 MB/s | 93 kB 00:00 (86/152): alternatives-1.26-1.fc39.aarch64.rpm 4.3 MB/s | 38 kB 00:00 (87/152): ansible-srpm-macros-1-12.fc39.noarch. 2.1 MB/s | 21 kB 00:00 (88/152): audit-libs-3.1.2-8.fc39.aarch64.rpm 14 MB/s | 118 kB 00:00 (89/152): bash-5.2.26-1.fc39.aarch64.rpm 157 MB/s | 1.8 MB 00:00 (90/152): binutils-gold-2.40-14.fc39.aarch64.rp 88 MB/s | 945 kB 00:00 (91/152): coreutils-9.3-5.fc39.aarch64.rpm 31 MB/s | 1.2 MB 00:00 (92/152): coreutils-common-9.3-5.fc39.aarch64.r 60 MB/s | 2.1 MB 00:00 (93/152): crypto-policies-20231204-1.git1e3a2e4 4.8 MB/s | 100 kB 00:00 (94/152): binutils-2.40-14.fc39.aarch64.rpm 79 MB/s | 6.1 MB 00:00 (95/152): curl-8.2.1-4.fc39.aarch64.rpm 11 MB/s | 341 kB 00:00 (96/152): debugedit-5.0-12.fc39.aarch64.rpm 2.8 MB/s | 78 kB 00:00 (97/152): elfutils-0.191-2.fc39.aarch64.rpm 29 MB/s | 560 kB 00:00 (98/152): elfutils-debuginfod-client-0.191-2.fc 1.5 MB/s | 38 kB 00:00 (99/152): elfutils-default-yama-scope-0.191-2.f 698 kB/s | 13 kB 00:00 (100/152): elfutils-libelf-0.191-2.fc39.aarch64 11 MB/s | 209 kB 00:00 (101/152): elfutils-libs-0.191-2.fc39.aarch64.r 17 MB/s | 263 kB 00:00 (102/152): fedora-release-39-36.noarch.rpm 405 kB/s | 8.6 kB 00:00 (103/152): fedora-release-common-39-36.noarch.r 939 kB/s | 19 kB 00:00 (104/152): fedora-release-identity-basic-39-36. 515 kB/s | 9.4 kB 00:00 (105/152): forge-srpm-macros-0.2.0-3.fc39.noarc 852 kB/s | 19 kB 00:00 (106/152): go-srpm-macros-3.5.0-1.fc39.noarch.r 1.0 MB/s | 28 kB 00:00 (107/152): gdb-minimal-14.2-1.fc39.aarch64.rpm 89 MB/s | 3.9 MB 00:00 (108/152): krb5-libs-1.21.2-3.fc39.aarch64.rpm 35 MB/s | 770 kB 00:00 (109/152): keyutils-libs-1.6.3-1.fc39.aarch64.r 1.1 MB/s | 32 kB 00:00 (110/152): libacl-2.3.1-9.fc39.aarch64.rpm 2.3 MB/s | 24 kB 00:00 (111/152): libblkid-2.39.4-1.fc39.aarch64.rpm 20 MB/s | 116 kB 00:00 (112/152): libcap-2.48-9.fc39.aarch64.rpm 11 MB/s | 69 kB 00:00 (113/152): libcurl-8.2.1-4.fc39.aarch64.rpm 52 MB/s | 317 kB 00:00 (114/152): libeconf-0.5.2-2.fc39.aarch64.rpm 5.3 MB/s | 30 kB 00:00 (115/152): libfdisk-2.39.4-1.fc39.aarch64.rpm 26 MB/s | 157 kB 00:00 (116/152): libgcc-13.2.1-7.fc39.aarch64.rpm 26 MB/s | 99 kB 00:00 (117/152): libgomp-13.2.1-7.fc39.aarch64.rpm 53 MB/s | 316 kB 00:00 (118/152): libidn2-2.3.7-1.fc39.aarch64.rpm 16 MB/s | 120 kB 00:00 (119/152): libmount-2.39.4-1.fc39.aarch64.rpm 16 MB/s | 153 kB 00:00 (120/152): libnghttp2-1.55.1-4.fc39.aarch64.rpm 10 MB/s | 76 kB 00:00 (121/152): libsmartcols-2.39.4-1.fc39.aarch64.r 9.8 MB/s | 65 kB 00:00 (122/152): libssh-0.10.6-2.fc39.aarch64.rpm 25 MB/s | 213 kB 00:00 (123/152): libssh-config-0.10.6-2.fc39.noarch.r 1.0 MB/s | 9.0 kB 00:00 (124/152): libstdc++-13.2.1-7.fc39.aarch64.rpm 83 MB/s | 818 kB 00:00 (125/152): libtirpc-1.3.4-1.rc3.fc39.aarch64.rp 14 MB/s | 94 kB 00:00 (126/152): libuuid-2.39.4-1.fc39.aarch64.rpm 4.9 MB/s | 28 kB 00:00 (127/152): libzstd-1.5.6-1.fc39.aarch64.rpm 51 MB/s | 284 kB 00:00 (128/152): lua-srpm-macros-1-13.fc39.noarch.rpm 1.5 MB/s | 8.7 kB 00:00 (129/152): ncurses-base-6.4-7.20230520.fc39.1.n 16 MB/s | 88 kB 00:00 (130/152): ncurses-libs-6.4-7.20230520.fc39.1.a 39 MB/s | 326 kB 00:00 (131/152): p11-kit-0.25.3-1.fc39.aarch64.rpm 37 MB/s | 495 kB 00:00 (132/152): p11-kit-trust-0.25.3-1.fc39.aarch64. 10 MB/s | 141 kB 00:00 (133/152): pam-1.5.3-3.fc39.aarch64.rpm 47 MB/s | 552 kB 00:00 (134/152): pam-libs-1.5.3-3.fc39.aarch64.rpm 6.1 MB/s | 57 kB 00:00 (135/152): publicsuffix-list-dafsa-20240107-1.f 6.4 MB/s | 58 kB 00:00 (136/152): pyproject-srpm-macros-1.12.0-1.fc39. 1.6 MB/s | 14 kB 00:00 (137/152): qt5-srpm-macros-5.15.12-1.fc39.noarc 977 kB/s | 8.4 kB 00:00 (138/152): qt6-srpm-macros-6.6.2-1.fc39.noarch. 1.0 MB/s | 8.9 kB 00:00 (139/152): readline-8.2-6.fc39.aarch64.rpm 23 MB/s | 212 kB 00:00 (140/152): redhat-rpm-config-266-1.fc39.noarch. 6.7 MB/s | 78 kB 00:00 (141/152): rpm-4.19.1.1-1.fc39.aarch64.rpm 40 MB/s | 536 kB 00:00 (142/152): rpm-build-4.19.1.1-1.fc39.aarch64.rp 6.1 MB/s | 79 kB 00:00 (143/152): rpm-build-libs-4.19.1.1-1.fc39.aarch 8.7 MB/s | 91 kB 00:00 (144/152): rpm-libs-4.19.1.1-1.fc39.aarch64.rpm 33 MB/s | 305 kB 00:00 (145/152): rpm-sequoia-1.6.0-1.fc39.aarch64.rpm 55 MB/s | 817 kB 00:00 (146/152): rust-srpm-macros-26.2-1.fc39.noarch. 907 kB/s | 13 kB 00:00 (147/152): shadow-utils-4.14.0-2.fc39.aarch64.r 73 MB/s | 1.3 MB 00:00 (148/152): systemd-libs-254.10-1.fc39.aarch64.r 40 MB/s | 665 kB 00:00 (149/152): rpmautospec-rpm-macros-0.6.3-1.fc39. 319 kB/s | 10 kB 00:00 (150/152): util-linux-core-2.39.4-1.fc39.aarch6 44 MB/s | 505 kB 00:00 (151/152): zstd-1.5.6-1.fc39.aarch64.rpm 36 MB/s | 445 kB 00:00 (152/152): util-linux-2.39.4-1.fc39.aarch64.rpm 51 MB/s | 1.2 MB 00:00 -------------------------------------------------------------------------------- Total 58 MB/s | 52 MB 00:00 fedora 1.6 MB/s | 1.6 kB 00:00 Importing GPG key 0x18B8E74C: Userid : "Fedora (39) " Fingerprint: E8F2 3996 F232 1864 0CB4 4CBE 75CF 5AC4 18B8 E74C From : /usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-39-primary Key imported successfully Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Running scriptlet: filesystem-3.18-6.fc39.aarch64 1/1 Preparing : 1/1 Installing : libgcc-13.2.1-7.fc39.aarch64 1/152 Running scriptlet: libgcc-13.2.1-7.fc39.aarch64 1/152 Installing : crypto-policies-20231204-1.git1e3a2e4.fc39.noarc 2/152 Running scriptlet: crypto-policies-20231204-1.git1e3a2e4.fc39.noarc 2/152 Installing : fedora-release-identity-basic-39-36.noarch 3/152 Installing : fedora-gpg-keys-39-1.noarch 4/152 Installing : fedora-repos-39-1.noarch 5/152 Installing : fedora-release-common-39-36.noarch 6/152 Installing : fedora-release-39-36.noarch 7/152 Installing : setup-2.14.4-1.fc39.noarch 8/152 Running scriptlet: setup-2.14.4-1.fc39.noarch 8/152 Installing : filesystem-3.18-6.fc39.aarch64 9/152 Installing : basesystem-11-18.fc39.noarch 10/152 Installing : rust-srpm-macros-26.2-1.fc39.noarch 11/152 Installing : qt6-srpm-macros-6.6.2-1.fc39.noarch 12/152 Installing : qt5-srpm-macros-5.15.12-1.fc39.noarch 13/152 Installing : publicsuffix-list-dafsa-20240107-1.fc39.noarch 14/152 Installing : ncurses-base-6.4-7.20230520.fc39.1.noarch 15/152 Installing : glibc-gconv-extra-2.38-99.fc39.aarch64 16/152 Running scriptlet: glibc-gconv-extra-2.38-99.fc39.aarch64 16/152 Installing : glibc-minimal-langpack-2.38-99.fc39.aarch64 17/152 Installing : glibc-common-2.38-99.fc39.aarch64 18/152 Running scriptlet: glibc-2.38-99.fc39.aarch64 19/152 Installing : glibc-2.38-99.fc39.aarch64 19/152 Running scriptlet: glibc-2.38-99.fc39.aarch64 19/152 Installing : ncurses-libs-6.4-7.20230520.fc39.1.aarch64 20/152 Installing : bash-5.2.26-1.fc39.aarch64 21/152 Running scriptlet: bash-5.2.26-1.fc39.aarch64 21/152 Installing : zlib-1.2.13-4.fc39.aarch64 22/152 Installing : xz-libs-5.4.4-1.fc39.aarch64 23/152 Installing : bzip2-libs-1.0.8-16.fc39.aarch64 24/152 Installing : popt-1.19-3.fc39.aarch64 25/152 Installing : libstdc++-13.2.1-7.fc39.aarch64 26/152 Installing : libuuid-2.39.4-1.fc39.aarch64 27/152 Installing : libzstd-1.5.6-1.fc39.aarch64 28/152 Installing : elfutils-libelf-0.191-2.fc39.aarch64 29/152 Installing : libblkid-2.39.4-1.fc39.aarch64 30/152 Installing : readline-8.2-6.fc39.aarch64 31/152 Installing : gmp-1:6.2.1-5.fc39.aarch64 32/152 Installing : libattr-2.5.1-8.fc39.aarch64 33/152 Installing : libacl-2.3.1-9.fc39.aarch64 34/152 Installing : libxcrypt-4.4.36-2.fc39.aarch64 35/152 Installing : libcap-2.48-9.fc39.aarch64 36/152 Installing : lz4-libs-1.9.4-4.fc39.aarch64 37/152 Installing : libeconf-0.5.2-2.fc39.aarch64 38/152 Installing : systemd-libs-254.10-1.fc39.aarch64 39/152 Installing : mpfr-4.2.0-3.fc39.aarch64 40/152 Installing : dwz-0.15-3.fc39.aarch64 41/152 Installing : unzip-6.0-62.fc39.aarch64 42/152 Installing : file-libs-5.44-5.fc39.aarch64 43/152 Installing : file-5.44-5.fc39.aarch64 44/152 Installing : jansson-2.13.1-7.fc39.aarch64 45/152 Installing : libcap-ng-0.8.3-8.fc39.aarch64 46/152 Installing : audit-libs-3.1.2-8.fc39.aarch64 47/152 Installing : pam-libs-1.5.3-3.fc39.aarch64 48/152 Installing : libcom_err-1.47.0-2.fc39.aarch64 49/152 Installing : libsepol-3.5-2.fc39.aarch64 50/152 Installing : libtasn1-4.19.0-3.fc39.aarch64 51/152 Installing : libunistring-1.1-5.fc39.aarch64 52/152 Installing : libidn2-2.3.7-1.fc39.aarch64 53/152 Installing : lua-libs-5.4.6-3.fc39.aarch64 54/152 Installing : alternatives-1.26-1.fc39.aarch64 55/152 Installing : libsmartcols-2.39.4-1.fc39.aarch64 56/152 Installing : libpsl-0.21.2-4.fc39.aarch64 57/152 Installing : zip-3.0-39.fc39.aarch64 58/152 Installing : zstd-1.5.6-1.fc39.aarch64 59/152 Installing : libfdisk-2.39.4-1.fc39.aarch64 60/152 Installing : bzip2-1.0.8-16.fc39.aarch64 61/152 Installing : libxml2-2.10.4-3.fc39.aarch64 62/152 Installing : sqlite-libs-3.42.0-7.fc39.aarch64 63/152 Installing : ed-1.19-4.fc39.aarch64 64/152 Installing : elfutils-default-yama-scope-0.191-2.fc39.noarch 65/152 Running scriptlet: elfutils-default-yama-scope-0.191-2.fc39.noarch 65/152 Installing : cpio-2.14-4.fc39.aarch64 66/152 Installing : diffutils-3.10-3.fc39.aarch64 67/152 Installing : gdbm-libs-1:1.23-4.fc39.aarch64 68/152 Installing : cyrus-sasl-lib-2.1.28-11.fc39.aarch64 69/152 Installing : libbrotli-1.1.0-1.fc39.aarch64 70/152 Installing : libdb-5.3.28-56.fc39.aarch64 71/152 Installing : libffi-3.4.4-4.fc39.aarch64 72/152 Installing : p11-kit-0.25.3-1.fc39.aarch64 73/152 Installing : p11-kit-trust-0.25.3-1.fc39.aarch64 74/152 Running scriptlet: p11-kit-trust-0.25.3-1.fc39.aarch64 74/152 Installing : libpkgconf-1.9.5-2.fc39.aarch64 75/152 Installing : pkgconf-1.9.5-2.fc39.aarch64 76/152 Installing : libsigsegv-2.14-5.fc39.aarch64 77/152 Installing : gawk-5.2.2-2.fc39.aarch64 78/152 Installing : libverto-0.3.2-6.fc39.aarch64 79/152 Installing : xxhash-libs-0.8.2-1.fc39.aarch64 80/152 Installing : keyutils-libs-1.6.3-1.fc39.aarch64 81/152 Installing : libgomp-13.2.1-7.fc39.aarch64 82/152 Installing : libnghttp2-1.55.1-4.fc39.aarch64 83/152 Installing : libssh-config-0.10.6-2.fc39.noarch 84/152 Installing : coreutils-common-9.3-5.fc39.aarch64 85/152 Installing : ansible-srpm-macros-1-12.fc39.noarch 86/152 Installing : pkgconf-m4-1.9.5-2.fc39.noarch 87/152 Installing : pkgconf-pkg-config-1.9.5-2.fc39.aarch64 88/152 Installing : perl-srpm-macros-1-51.fc39.noarch 89/152 Installing : pcre2-syntax-10.42-1.fc39.2.noarch 90/152 Installing : pcre2-10.42-1.fc39.2.aarch64 91/152 Installing : libselinux-3.5-5.fc39.aarch64 92/152 Installing : sed-4.8-14.fc39.aarch64 93/152 Installing : grep-3.11-3.fc39.aarch64 94/152 Installing : findutils-1:4.9.0-5.fc39.aarch64 95/152 Installing : xz-5.4.4-1.fc39.aarch64 96/152 Installing : libmount-2.39.4-1.fc39.aarch64 97/152 Installing : util-linux-core-2.39.4-1.fc39.aarch64 98/152 Installing : openssl-libs-1:3.1.1-4.fc39.aarch64 99/152 Installing : coreutils-9.3-5.fc39.aarch64 100/152 Running scriptlet: ca-certificates-2023.2.60_v7.0.306-2.fc39.noarch 101/152 Installing : ca-certificates-2023.2.60_v7.0.306-2.fc39.noarch 101/152 Running scriptlet: ca-certificates-2023.2.60_v7.0.306-2.fc39.noarch 101/152 Installing : krb5-libs-1.21.2-3.fc39.aarch64 102/152 Installing : libtirpc-1.3.4-1.rc3.fc39.aarch64 103/152 Running scriptlet: authselect-libs-1.4.3-1.fc39.aarch64 104/152 Installing : authselect-libs-1.4.3-1.fc39.aarch64 104/152 Installing : gzip-1.12-6.fc39.aarch64 105/152 Installing : libarchive-3.7.1-1.fc39.aarch64 106/152 Installing : cracklib-2.9.11-2.fc39.aarch64 107/152 Installing : libpwquality-1.4.5-6.fc39.aarch64 108/152 Installing : authselect-1.4.3-1.fc39.aarch64 109/152 Installing : libnsl2-2.0.0-6.fc39.aarch64 110/152 Installing : pam-1.5.3-3.fc39.aarch64 111/152 Installing : libssh-0.10.6-2.fc39.aarch64 112/152 Installing : libevent-2.1.12-9.fc39.aarch64 113/152 Installing : openldap-2.6.6-1.fc39.aarch64 114/152 Installing : libcurl-8.2.1-4.fc39.aarch64 115/152 Installing : elfutils-libs-0.191-2.fc39.aarch64 116/152 Installing : elfutils-debuginfod-client-0.191-2.fc39.aarch64 117/152 Installing : binutils-gold-2.40-14.fc39.aarch64 118/152 Running scriptlet: binutils-gold-2.40-14.fc39.aarch64 118/152 Installing : binutils-2.40-14.fc39.aarch64 119/152 Running scriptlet: binutils-2.40-14.fc39.aarch64 119/152 Installing : elfutils-0.191-2.fc39.aarch64 120/152 Installing : gdb-minimal-14.2-1.fc39.aarch64 121/152 Installing : debugedit-5.0-12.fc39.aarch64 122/152 Installing : curl-8.2.1-4.fc39.aarch64 123/152 Installing : rpm-sequoia-1.6.0-1.fc39.aarch64 124/152 Installing : rpm-libs-4.19.1.1-1.fc39.aarch64 125/152 Running scriptlet: rpm-4.19.1.1-1.fc39.aarch64 126/152 Installing : rpm-4.19.1.1-1.fc39.aarch64 126/152 Installing : efi-srpm-macros-5-9.fc39.noarch 127/152 Installing : lua-srpm-macros-1-13.fc39.noarch 128/152 Installing : rpmautospec-rpm-macros-0.6.3-1.fc39.noarch 129/152 Installing : rpm-build-libs-4.19.1.1-1.fc39.aarch64 130/152 Installing : libsemanage-3.5-4.fc39.aarch64 131/152 Installing : shadow-utils-2:4.14.0-2.fc39.aarch64 132/152 Running scriptlet: libutempter-1.2.1-10.fc39.aarch64 133/152 Installing : libutempter-1.2.1-10.fc39.aarch64 133/152 Installing : patch-2.7.6-22.fc39.aarch64 134/152 Installing : tar-2:1.35-2.fc39.aarch64 135/152 Installing : package-notes-srpm-macros-0.5-9.fc39.noarch 136/152 Installing : openblas-srpm-macros-2-14.fc39.noarch 137/152 Installing : ocaml-srpm-macros-8-2.fc39.noarch 138/152 Installing : kernel-srpm-macros-1.0-20.fc39.noarch 139/152 Installing : gnat-srpm-macros-6-3.fc39.noarch 140/152 Installing : ghc-srpm-macros-1.6.1-2.fc39.noarch 141/152 Installing : fpc-srpm-macros-1.3-8.fc39.noarch 142/152 Installing : fonts-srpm-macros-1:2.0.5-12.fc39.noarch 143/152 Installing : python-srpm-macros-3.12-4.fc39.noarch 144/152 Installing : forge-srpm-macros-0.2.0-3.fc39.noarch 145/152 Installing : go-srpm-macros-3.5.0-1.fc39.noarch 146/152 Installing : redhat-rpm-config-266-1.fc39.noarch 147/152 Installing : rpm-build-4.19.1.1-1.fc39.aarch64 148/152 Installing : pyproject-srpm-macros-1.12.0-1.fc39.noarch 149/152 Installing : util-linux-2.39.4-1.fc39.aarch64 150/152 Running scriptlet: util-linux-2.39.4-1.fc39.aarch64 150/152 Installing : which-2.21-40.fc39.aarch64 151/152 Installing : info-7.0.3-3.fc39.aarch64 152/152 Running scriptlet: filesystem-3.18-6.fc39.aarch64 152/152 Running scriptlet: ca-certificates-2023.2.60_v7.0.306-2.fc39.noarch 152/152 Running scriptlet: authselect-libs-1.4.3-1.fc39.aarch64 152/152 Running scriptlet: rpm-4.19.1.1-1.fc39.aarch64 152/152 Running scriptlet: info-7.0.3-3.fc39.aarch64 152/152 Verifying : glibc-2.38-99.fc39.aarch64 1/152 Verifying : glibc-common-2.38-99.fc39.aarch64 2/152 Verifying : glibc-gconv-extra-2.38-99.fc39.aarch64 3/152 Verifying : glibc-minimal-langpack-2.38-99.fc39.aarch64 4/152 Verifying : authselect-1.4.3-1.fc39.aarch64 5/152 Verifying : authselect-libs-1.4.3-1.fc39.aarch64 6/152 Verifying : basesystem-11-18.fc39.noarch 7/152 Verifying : bzip2-1.0.8-16.fc39.aarch64 8/152 Verifying : bzip2-libs-1.0.8-16.fc39.aarch64 9/152 Verifying : ca-certificates-2023.2.60_v7.0.306-2.fc39.noarch 10/152 Verifying : cpio-2.14-4.fc39.aarch64 11/152 Verifying : cracklib-2.9.11-2.fc39.aarch64 12/152 Verifying : cyrus-sasl-lib-2.1.28-11.fc39.aarch64 13/152 Verifying : diffutils-3.10-3.fc39.aarch64 14/152 Verifying : dwz-0.15-3.fc39.aarch64 15/152 Verifying : ed-1.19-4.fc39.aarch64 16/152 Verifying : efi-srpm-macros-5-9.fc39.noarch 17/152 Verifying : fedora-gpg-keys-39-1.noarch 18/152 Verifying : fedora-repos-39-1.noarch 19/152 Verifying : file-5.44-5.fc39.aarch64 20/152 Verifying : file-libs-5.44-5.fc39.aarch64 21/152 Verifying : filesystem-3.18-6.fc39.aarch64 22/152 Verifying : findutils-1:4.9.0-5.fc39.aarch64 23/152 Verifying : fonts-srpm-macros-1:2.0.5-12.fc39.noarch 24/152 Verifying : fpc-srpm-macros-1.3-8.fc39.noarch 25/152 Verifying : gawk-5.2.2-2.fc39.aarch64 26/152 Verifying : gdbm-libs-1:1.23-4.fc39.aarch64 27/152 Verifying : ghc-srpm-macros-1.6.1-2.fc39.noarch 28/152 Verifying : gmp-1:6.2.1-5.fc39.aarch64 29/152 Verifying : gnat-srpm-macros-6-3.fc39.noarch 30/152 Verifying : grep-3.11-3.fc39.aarch64 31/152 Verifying : gzip-1.12-6.fc39.aarch64 32/152 Verifying : info-7.0.3-3.fc39.aarch64 33/152 Verifying : jansson-2.13.1-7.fc39.aarch64 34/152 Verifying : kernel-srpm-macros-1.0-20.fc39.noarch 35/152 Verifying : libarchive-3.7.1-1.fc39.aarch64 36/152 Verifying : libattr-2.5.1-8.fc39.aarch64 37/152 Verifying : libbrotli-1.1.0-1.fc39.aarch64 38/152 Verifying : libcap-ng-0.8.3-8.fc39.aarch64 39/152 Verifying : libcom_err-1.47.0-2.fc39.aarch64 40/152 Verifying : libdb-5.3.28-56.fc39.aarch64 41/152 Verifying : libevent-2.1.12-9.fc39.aarch64 42/152 Verifying : libffi-3.4.4-4.fc39.aarch64 43/152 Verifying : libnsl2-2.0.0-6.fc39.aarch64 44/152 Verifying : libpkgconf-1.9.5-2.fc39.aarch64 45/152 Verifying : libpsl-0.21.2-4.fc39.aarch64 46/152 Verifying : libpwquality-1.4.5-6.fc39.aarch64 47/152 Verifying : libselinux-3.5-5.fc39.aarch64 48/152 Verifying : libsemanage-3.5-4.fc39.aarch64 49/152 Verifying : libsepol-3.5-2.fc39.aarch64 50/152 Verifying : libsigsegv-2.14-5.fc39.aarch64 51/152 Verifying : libtasn1-4.19.0-3.fc39.aarch64 52/152 Verifying : libunistring-1.1-5.fc39.aarch64 53/152 Verifying : libutempter-1.2.1-10.fc39.aarch64 54/152 Verifying : libverto-0.3.2-6.fc39.aarch64 55/152 Verifying : libxcrypt-4.4.36-2.fc39.aarch64 56/152 Verifying : libxml2-2.10.4-3.fc39.aarch64 57/152 Verifying : lua-libs-5.4.6-3.fc39.aarch64 58/152 Verifying : lz4-libs-1.9.4-4.fc39.aarch64 59/152 Verifying : mpfr-4.2.0-3.fc39.aarch64 60/152 Verifying : ocaml-srpm-macros-8-2.fc39.noarch 61/152 Verifying : openblas-srpm-macros-2-14.fc39.noarch 62/152 Verifying : openldap-2.6.6-1.fc39.aarch64 63/152 Verifying : openssl-libs-1:3.1.1-4.fc39.aarch64 64/152 Verifying : package-notes-srpm-macros-0.5-9.fc39.noarch 65/152 Verifying : patch-2.7.6-22.fc39.aarch64 66/152 Verifying : pcre2-10.42-1.fc39.2.aarch64 67/152 Verifying : pcre2-syntax-10.42-1.fc39.2.noarch 68/152 Verifying : perl-srpm-macros-1-51.fc39.noarch 69/152 Verifying : pkgconf-1.9.5-2.fc39.aarch64 70/152 Verifying : pkgconf-m4-1.9.5-2.fc39.noarch 71/152 Verifying : pkgconf-pkg-config-1.9.5-2.fc39.aarch64 72/152 Verifying : popt-1.19-3.fc39.aarch64 73/152 Verifying : python-srpm-macros-3.12-4.fc39.noarch 74/152 Verifying : sed-4.8-14.fc39.aarch64 75/152 Verifying : setup-2.14.4-1.fc39.noarch 76/152 Verifying : sqlite-libs-3.42.0-7.fc39.aarch64 77/152 Verifying : tar-2:1.35-2.fc39.aarch64 78/152 Verifying : unzip-6.0-62.fc39.aarch64 79/152 Verifying : which-2.21-40.fc39.aarch64 80/152 Verifying : xxhash-libs-0.8.2-1.fc39.aarch64 81/152 Verifying : xz-5.4.4-1.fc39.aarch64 82/152 Verifying : xz-libs-5.4.4-1.fc39.aarch64 83/152 Verifying : zip-3.0-39.fc39.aarch64 84/152 Verifying : zlib-1.2.13-4.fc39.aarch64 85/152 Verifying : alternatives-1.26-1.fc39.aarch64 86/152 Verifying : ansible-srpm-macros-1-12.fc39.noarch 87/152 Verifying : audit-libs-3.1.2-8.fc39.aarch64 88/152 Verifying : bash-5.2.26-1.fc39.aarch64 89/152 Verifying : binutils-2.40-14.fc39.aarch64 90/152 Verifying : binutils-gold-2.40-14.fc39.aarch64 91/152 Verifying : coreutils-9.3-5.fc39.aarch64 92/152 Verifying : coreutils-common-9.3-5.fc39.aarch64 93/152 Verifying : crypto-policies-20231204-1.git1e3a2e4.fc39.noarc 94/152 Verifying : curl-8.2.1-4.fc39.aarch64 95/152 Verifying : debugedit-5.0-12.fc39.aarch64 96/152 Verifying : elfutils-0.191-2.fc39.aarch64 97/152 Verifying : elfutils-debuginfod-client-0.191-2.fc39.aarch64 98/152 Verifying : elfutils-default-yama-scope-0.191-2.fc39.noarch 99/152 Verifying : elfutils-libelf-0.191-2.fc39.aarch64 100/152 Verifying : elfutils-libs-0.191-2.fc39.aarch64 101/152 Verifying : fedora-release-39-36.noarch 102/152 Verifying : fedora-release-common-39-36.noarch 103/152 Verifying : fedora-release-identity-basic-39-36.noarch 104/152 Verifying : forge-srpm-macros-0.2.0-3.fc39.noarch 105/152 Verifying : gdb-minimal-14.2-1.fc39.aarch64 106/152 Verifying : go-srpm-macros-3.5.0-1.fc39.noarch 107/152 Verifying : keyutils-libs-1.6.3-1.fc39.aarch64 108/152 Verifying : krb5-libs-1.21.2-3.fc39.aarch64 109/152 Verifying : libacl-2.3.1-9.fc39.aarch64 110/152 Verifying : libblkid-2.39.4-1.fc39.aarch64 111/152 Verifying : libcap-2.48-9.fc39.aarch64 112/152 Verifying : libcurl-8.2.1-4.fc39.aarch64 113/152 Verifying : libeconf-0.5.2-2.fc39.aarch64 114/152 Verifying : libfdisk-2.39.4-1.fc39.aarch64 115/152 Verifying : libgcc-13.2.1-7.fc39.aarch64 116/152 Verifying : libgomp-13.2.1-7.fc39.aarch64 117/152 Verifying : libidn2-2.3.7-1.fc39.aarch64 118/152 Verifying : libmount-2.39.4-1.fc39.aarch64 119/152 Verifying : libnghttp2-1.55.1-4.fc39.aarch64 120/152 Verifying : libsmartcols-2.39.4-1.fc39.aarch64 121/152 Verifying : libssh-0.10.6-2.fc39.aarch64 122/152 Verifying : libssh-config-0.10.6-2.fc39.noarch 123/152 Verifying : libstdc++-13.2.1-7.fc39.aarch64 124/152 Verifying : libtirpc-1.3.4-1.rc3.fc39.aarch64 125/152 Verifying : libuuid-2.39.4-1.fc39.aarch64 126/152 Verifying : libzstd-1.5.6-1.fc39.aarch64 127/152 Verifying : lua-srpm-macros-1-13.fc39.noarch 128/152 Verifying : ncurses-base-6.4-7.20230520.fc39.1.noarch 129/152 Verifying : ncurses-libs-6.4-7.20230520.fc39.1.aarch64 130/152 Verifying : p11-kit-0.25.3-1.fc39.aarch64 131/152 Verifying : p11-kit-trust-0.25.3-1.fc39.aarch64 132/152 Verifying : pam-1.5.3-3.fc39.aarch64 133/152 Verifying : pam-libs-1.5.3-3.fc39.aarch64 134/152 Verifying : publicsuffix-list-dafsa-20240107-1.fc39.noarch 135/152 Verifying : pyproject-srpm-macros-1.12.0-1.fc39.noarch 136/152 Verifying : qt5-srpm-macros-5.15.12-1.fc39.noarch 137/152 Verifying : qt6-srpm-macros-6.6.2-1.fc39.noarch 138/152 Verifying : readline-8.2-6.fc39.aarch64 139/152 Verifying : redhat-rpm-config-266-1.fc39.noarch 140/152 Verifying : rpm-4.19.1.1-1.fc39.aarch64 141/152 Verifying : rpm-build-4.19.1.1-1.fc39.aarch64 142/152 Verifying : rpm-build-libs-4.19.1.1-1.fc39.aarch64 143/152 Verifying : rpm-libs-4.19.1.1-1.fc39.aarch64 144/152 Verifying : rpm-sequoia-1.6.0-1.fc39.aarch64 145/152 Verifying : rpmautospec-rpm-macros-0.6.3-1.fc39.noarch 146/152 Verifying : rust-srpm-macros-26.2-1.fc39.noarch 147/152 Verifying : shadow-utils-2:4.14.0-2.fc39.aarch64 148/152 Verifying : systemd-libs-254.10-1.fc39.aarch64 149/152 Verifying : util-linux-2.39.4-1.fc39.aarch64 150/152 Verifying : util-linux-core-2.39.4-1.fc39.aarch64 151/152 Verifying : zstd-1.5.6-1.fc39.aarch64 152/152 Installed: alternatives-1.26-1.fc39.aarch64 ansible-srpm-macros-1-12.fc39.noarch audit-libs-3.1.2-8.fc39.aarch64 authselect-1.4.3-1.fc39.aarch64 authselect-libs-1.4.3-1.fc39.aarch64 basesystem-11-18.fc39.noarch bash-5.2.26-1.fc39.aarch64 binutils-2.40-14.fc39.aarch64 binutils-gold-2.40-14.fc39.aarch64 bzip2-1.0.8-16.fc39.aarch64 bzip2-libs-1.0.8-16.fc39.aarch64 ca-certificates-2023.2.60_v7.0.306-2.fc39.noarch coreutils-9.3-5.fc39.aarch64 coreutils-common-9.3-5.fc39.aarch64 cpio-2.14-4.fc39.aarch64 cracklib-2.9.11-2.fc39.aarch64 crypto-policies-20231204-1.git1e3a2e4.fc39.noarch curl-8.2.1-4.fc39.aarch64 cyrus-sasl-lib-2.1.28-11.fc39.aarch64 debugedit-5.0-12.fc39.aarch64 diffutils-3.10-3.fc39.aarch64 dwz-0.15-3.fc39.aarch64 ed-1.19-4.fc39.aarch64 efi-srpm-macros-5-9.fc39.noarch elfutils-0.191-2.fc39.aarch64 elfutils-debuginfod-client-0.191-2.fc39.aarch64 elfutils-default-yama-scope-0.191-2.fc39.noarch elfutils-libelf-0.191-2.fc39.aarch64 elfutils-libs-0.191-2.fc39.aarch64 fedora-gpg-keys-39-1.noarch fedora-release-39-36.noarch fedora-release-common-39-36.noarch fedora-release-identity-basic-39-36.noarch fedora-repos-39-1.noarch file-5.44-5.fc39.aarch64 file-libs-5.44-5.fc39.aarch64 filesystem-3.18-6.fc39.aarch64 findutils-1:4.9.0-5.fc39.aarch64 fonts-srpm-macros-1:2.0.5-12.fc39.noarch forge-srpm-macros-0.2.0-3.fc39.noarch fpc-srpm-macros-1.3-8.fc39.noarch gawk-5.2.2-2.fc39.aarch64 gdb-minimal-14.2-1.fc39.aarch64 gdbm-libs-1:1.23-4.fc39.aarch64 ghc-srpm-macros-1.6.1-2.fc39.noarch glibc-2.38-99.fc39.aarch64 glibc-common-2.38-99.fc39.aarch64 glibc-gconv-extra-2.38-99.fc39.aarch64 glibc-minimal-langpack-2.38-99.fc39.aarch64 gmp-1:6.2.1-5.fc39.aarch64 gnat-srpm-macros-6-3.fc39.noarch go-srpm-macros-3.5.0-1.fc39.noarch grep-3.11-3.fc39.aarch64 gzip-1.12-6.fc39.aarch64 info-7.0.3-3.fc39.aarch64 jansson-2.13.1-7.fc39.aarch64 kernel-srpm-macros-1.0-20.fc39.noarch keyutils-libs-1.6.3-1.fc39.aarch64 krb5-libs-1.21.2-3.fc39.aarch64 libacl-2.3.1-9.fc39.aarch64 libarchive-3.7.1-1.fc39.aarch64 libattr-2.5.1-8.fc39.aarch64 libblkid-2.39.4-1.fc39.aarch64 libbrotli-1.1.0-1.fc39.aarch64 libcap-2.48-9.fc39.aarch64 libcap-ng-0.8.3-8.fc39.aarch64 libcom_err-1.47.0-2.fc39.aarch64 libcurl-8.2.1-4.fc39.aarch64 libdb-5.3.28-56.fc39.aarch64 libeconf-0.5.2-2.fc39.aarch64 libevent-2.1.12-9.fc39.aarch64 libfdisk-2.39.4-1.fc39.aarch64 libffi-3.4.4-4.fc39.aarch64 libgcc-13.2.1-7.fc39.aarch64 libgomp-13.2.1-7.fc39.aarch64 libidn2-2.3.7-1.fc39.aarch64 libmount-2.39.4-1.fc39.aarch64 libnghttp2-1.55.1-4.fc39.aarch64 libnsl2-2.0.0-6.fc39.aarch64 libpkgconf-1.9.5-2.fc39.aarch64 libpsl-0.21.2-4.fc39.aarch64 libpwquality-1.4.5-6.fc39.aarch64 libselinux-3.5-5.fc39.aarch64 libsemanage-3.5-4.fc39.aarch64 libsepol-3.5-2.fc39.aarch64 libsigsegv-2.14-5.fc39.aarch64 libsmartcols-2.39.4-1.fc39.aarch64 libssh-0.10.6-2.fc39.aarch64 libssh-config-0.10.6-2.fc39.noarch libstdc++-13.2.1-7.fc39.aarch64 libtasn1-4.19.0-3.fc39.aarch64 libtirpc-1.3.4-1.rc3.fc39.aarch64 libunistring-1.1-5.fc39.aarch64 libutempter-1.2.1-10.fc39.aarch64 libuuid-2.39.4-1.fc39.aarch64 libverto-0.3.2-6.fc39.aarch64 libxcrypt-4.4.36-2.fc39.aarch64 libxml2-2.10.4-3.fc39.aarch64 libzstd-1.5.6-1.fc39.aarch64 lua-libs-5.4.6-3.fc39.aarch64 lua-srpm-macros-1-13.fc39.noarch lz4-libs-1.9.4-4.fc39.aarch64 mpfr-4.2.0-3.fc39.aarch64 ncurses-base-6.4-7.20230520.fc39.1.noarch ncurses-libs-6.4-7.20230520.fc39.1.aarch64 ocaml-srpm-macros-8-2.fc39.noarch openblas-srpm-macros-2-14.fc39.noarch openldap-2.6.6-1.fc39.aarch64 openssl-libs-1:3.1.1-4.fc39.aarch64 p11-kit-0.25.3-1.fc39.aarch64 p11-kit-trust-0.25.3-1.fc39.aarch64 package-notes-srpm-macros-0.5-9.fc39.noarch pam-1.5.3-3.fc39.aarch64 pam-libs-1.5.3-3.fc39.aarch64 patch-2.7.6-22.fc39.aarch64 pcre2-10.42-1.fc39.2.aarch64 pcre2-syntax-10.42-1.fc39.2.noarch perl-srpm-macros-1-51.fc39.noarch pkgconf-1.9.5-2.fc39.aarch64 pkgconf-m4-1.9.5-2.fc39.noarch pkgconf-pkg-config-1.9.5-2.fc39.aarch64 popt-1.19-3.fc39.aarch64 publicsuffix-list-dafsa-20240107-1.fc39.noarch pyproject-srpm-macros-1.12.0-1.fc39.noarch python-srpm-macros-3.12-4.fc39.noarch qt5-srpm-macros-5.15.12-1.fc39.noarch qt6-srpm-macros-6.6.2-1.fc39.noarch readline-8.2-6.fc39.aarch64 redhat-rpm-config-266-1.fc39.noarch rpm-4.19.1.1-1.fc39.aarch64 rpm-build-4.19.1.1-1.fc39.aarch64 rpm-build-libs-4.19.1.1-1.fc39.aarch64 rpm-libs-4.19.1.1-1.fc39.aarch64 rpm-sequoia-1.6.0-1.fc39.aarch64 rpmautospec-rpm-macros-0.6.3-1.fc39.noarch rust-srpm-macros-26.2-1.fc39.noarch sed-4.8-14.fc39.aarch64 setup-2.14.4-1.fc39.noarch shadow-utils-2:4.14.0-2.fc39.aarch64 sqlite-libs-3.42.0-7.fc39.aarch64 systemd-libs-254.10-1.fc39.aarch64 tar-2:1.35-2.fc39.aarch64 unzip-6.0-62.fc39.aarch64 util-linux-2.39.4-1.fc39.aarch64 util-linux-core-2.39.4-1.fc39.aarch64 which-2.21-40.fc39.aarch64 xxhash-libs-0.8.2-1.fc39.aarch64 xz-5.4.4-1.fc39.aarch64 xz-libs-5.4.4-1.fc39.aarch64 zip-3.0-39.fc39.aarch64 zlib-1.2.13-4.fc39.aarch64 zstd-1.5.6-1.fc39.aarch64 Complete! Finish: installing minimal buildroot with dnf Start: creating root cache Finish: creating root cache Finish: chroot init INFO: Installed packages: INFO: alternatives-1.26-1.fc39.aarch64 ansible-srpm-macros-1-12.fc39.noarch audit-libs-3.1.2-8.fc39.aarch64 authselect-1.4.3-1.fc39.aarch64 authselect-libs-1.4.3-1.fc39.aarch64 basesystem-11-18.fc39.noarch bash-5.2.26-1.fc39.aarch64 binutils-2.40-14.fc39.aarch64 binutils-gold-2.40-14.fc39.aarch64 bzip2-1.0.8-16.fc39.aarch64 bzip2-libs-1.0.8-16.fc39.aarch64 ca-certificates-2023.2.60_v7.0.306-2.fc39.noarch coreutils-9.3-5.fc39.aarch64 coreutils-common-9.3-5.fc39.aarch64 cpio-2.14-4.fc39.aarch64 cracklib-2.9.11-2.fc39.aarch64 crypto-policies-20231204-1.git1e3a2e4.fc39.noarch curl-8.2.1-4.fc39.aarch64 cyrus-sasl-lib-2.1.28-11.fc39.aarch64 debugedit-5.0-12.fc39.aarch64 diffutils-3.10-3.fc39.aarch64 dwz-0.15-3.fc39.aarch64 ed-1.19-4.fc39.aarch64 efi-srpm-macros-5-9.fc39.noarch elfutils-0.191-2.fc39.aarch64 elfutils-debuginfod-client-0.191-2.fc39.aarch64 elfutils-default-yama-scope-0.191-2.fc39.noarch elfutils-libelf-0.191-2.fc39.aarch64 elfutils-libs-0.191-2.fc39.aarch64 fedora-gpg-keys-39-1.noarch fedora-release-39-36.noarch fedora-release-common-39-36.noarch fedora-release-identity-basic-39-36.noarch fedora-repos-39-1.noarch file-5.44-5.fc39.aarch64 file-libs-5.44-5.fc39.aarch64 filesystem-3.18-6.fc39.aarch64 findutils-4.9.0-5.fc39.aarch64 fonts-srpm-macros-2.0.5-12.fc39.noarch forge-srpm-macros-0.2.0-3.fc39.noarch fpc-srpm-macros-1.3-8.fc39.noarch gawk-5.2.2-2.fc39.aarch64 gdb-minimal-14.2-1.fc39.aarch64 gdbm-libs-1.23-4.fc39.aarch64 ghc-srpm-macros-1.6.1-2.fc39.noarch glibc-2.38-99.fc39.aarch64 glibc-common-2.38-99.fc39.aarch64 glibc-gconv-extra-2.38-99.fc39.aarch64 glibc-minimal-langpack-2.38-99.fc39.aarch64 gmp-6.2.1-5.fc39.aarch64 gnat-srpm-macros-6-3.fc39.noarch go-srpm-macros-3.5.0-1.fc39.noarch gpg-pubkey-18b8e74c-62f2920f grep-3.11-3.fc39.aarch64 gzip-1.12-6.fc39.aarch64 info-7.0.3-3.fc39.aarch64 jansson-2.13.1-7.fc39.aarch64 kernel-srpm-macros-1.0-20.fc39.noarch keyutils-libs-1.6.3-1.fc39.aarch64 krb5-libs-1.21.2-3.fc39.aarch64 libacl-2.3.1-9.fc39.aarch64 libarchive-3.7.1-1.fc39.aarch64 libattr-2.5.1-8.fc39.aarch64 libblkid-2.39.4-1.fc39.aarch64 libbrotli-1.1.0-1.fc39.aarch64 libcap-2.48-9.fc39.aarch64 libcap-ng-0.8.3-8.fc39.aarch64 libcom_err-1.47.0-2.fc39.aarch64 libcurl-8.2.1-4.fc39.aarch64 libdb-5.3.28-56.fc39.aarch64 libeconf-0.5.2-2.fc39.aarch64 libevent-2.1.12-9.fc39.aarch64 libfdisk-2.39.4-1.fc39.aarch64 libffi-3.4.4-4.fc39.aarch64 libgcc-13.2.1-7.fc39.aarch64 libgomp-13.2.1-7.fc39.aarch64 libidn2-2.3.7-1.fc39.aarch64 libmount-2.39.4-1.fc39.aarch64 libnghttp2-1.55.1-4.fc39.aarch64 libnsl2-2.0.0-6.fc39.aarch64 libpkgconf-1.9.5-2.fc39.aarch64 libpsl-0.21.2-4.fc39.aarch64 libpwquality-1.4.5-6.fc39.aarch64 libselinux-3.5-5.fc39.aarch64 libsemanage-3.5-4.fc39.aarch64 libsepol-3.5-2.fc39.aarch64 libsigsegv-2.14-5.fc39.aarch64 libsmartcols-2.39.4-1.fc39.aarch64 libssh-0.10.6-2.fc39.aarch64 libssh-config-0.10.6-2.fc39.noarch libstdc++-13.2.1-7.fc39.aarch64 libtasn1-4.19.0-3.fc39.aarch64 libtirpc-1.3.4-1.rc3.fc39.aarch64 libunistring-1.1-5.fc39.aarch64 libutempter-1.2.1-10.fc39.aarch64 libuuid-2.39.4-1.fc39.aarch64 libverto-0.3.2-6.fc39.aarch64 libxcrypt-4.4.36-2.fc39.aarch64 libxml2-2.10.4-3.fc39.aarch64 libzstd-1.5.6-1.fc39.aarch64 lua-libs-5.4.6-3.fc39.aarch64 lua-srpm-macros-1-13.fc39.noarch lz4-libs-1.9.4-4.fc39.aarch64 mpfr-4.2.0-3.fc39.aarch64 ncurses-base-6.4-7.20230520.fc39.1.noarch ncurses-libs-6.4-7.20230520.fc39.1.aarch64 ocaml-srpm-macros-8-2.fc39.noarch openblas-srpm-macros-2-14.fc39.noarch openldap-2.6.6-1.fc39.aarch64 openssl-libs-3.1.1-4.fc39.aarch64 p11-kit-0.25.3-1.fc39.aarch64 p11-kit-trust-0.25.3-1.fc39.aarch64 package-notes-srpm-macros-0.5-9.fc39.noarch pam-1.5.3-3.fc39.aarch64 pam-libs-1.5.3-3.fc39.aarch64 patch-2.7.6-22.fc39.aarch64 pcre2-10.42-1.fc39.2.aarch64 pcre2-syntax-10.42-1.fc39.2.noarch perl-srpm-macros-1-51.fc39.noarch pkgconf-1.9.5-2.fc39.aarch64 pkgconf-m4-1.9.5-2.fc39.noarch pkgconf-pkg-config-1.9.5-2.fc39.aarch64 popt-1.19-3.fc39.aarch64 publicsuffix-list-dafsa-20240107-1.fc39.noarch pyproject-srpm-macros-1.12.0-1.fc39.noarch python-srpm-macros-3.12-4.fc39.noarch qt5-srpm-macros-5.15.12-1.fc39.noarch qt6-srpm-macros-6.6.2-1.fc39.noarch readline-8.2-6.fc39.aarch64 redhat-rpm-config-266-1.fc39.noarch rpm-4.19.1.1-1.fc39.aarch64 rpm-build-4.19.1.1-1.fc39.aarch64 rpm-build-libs-4.19.1.1-1.fc39.aarch64 rpm-libs-4.19.1.1-1.fc39.aarch64 rpm-sequoia-1.6.0-1.fc39.aarch64 rpmautospec-rpm-macros-0.6.3-1.fc39.noarch rust-srpm-macros-26.2-1.fc39.noarch sed-4.8-14.fc39.aarch64 setup-2.14.4-1.fc39.noarch shadow-utils-4.14.0-2.fc39.aarch64 sqlite-libs-3.42.0-7.fc39.aarch64 systemd-libs-254.10-1.fc39.aarch64 tar-1.35-2.fc39.aarch64 unzip-6.0-62.fc39.aarch64 util-linux-2.39.4-1.fc39.aarch64 util-linux-core-2.39.4-1.fc39.aarch64 which-2.21-40.fc39.aarch64 xxhash-libs-0.8.2-1.fc39.aarch64 xz-5.4.4-1.fc39.aarch64 xz-libs-5.4.4-1.fc39.aarch64 zip-3.0-39.fc39.aarch64 zlib-1.2.13-4.fc39.aarch64 zstd-1.5.6-1.fc39.aarch64 Start: buildsrpm Start: rpmbuild -bs warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: aarch64 Building for target aarch64 setting SOURCE_DATE_EPOCH=1554595200 Wrote: /builddir/build/SRPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Finish: rpmbuild -bs cp: preserving permissions for ‘/var/lib/copr-rpmbuild/results/chroot_scan/var/lib/mock/fedora-39-aarch64-1712885724.178146/root/var/log’: No such file or directory INFO: chroot_scan: 3 files copied to /var/lib/copr-rpmbuild/results/chroot_scan INFO: /var/lib/mock/fedora-39-aarch64-1712885724.178146/root/var/log/dnf.rpm.log /var/lib/mock/fedora-39-aarch64-1712885724.178146/root/var/log/dnf.librepo.log /var/lib/mock/fedora-39-aarch64-1712885724.178146/root/var/log/dnf.log Finish: buildsrpm INFO: Done(/var/lib/copr-rpmbuild/workspace/workdir-lh244o7v/pytorch/pytorch.spec) Config(child) 1 minutes 20 seconds INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results INFO: Cleaning up build root ('cleanup_on_success=True') Start: clean chroot INFO: unmounting tmpfs. Finish: clean chroot INFO: Start(/var/lib/copr-rpmbuild/results/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm) Config(fedora-39-aarch64) Start(bootstrap): chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-39-aarch64-bootstrap-1712885724.178146/root. INFO: reusing tmpfs at /var/lib/mock/fedora-39-aarch64-bootstrap-1712885724.178146/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start(bootstrap): cleaning package manager metadata Finish(bootstrap): cleaning package manager metadata Finish(bootstrap): chroot init Start: chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-39-aarch64-1712885724.178146/root. INFO: calling preinit hooks INFO: enabled root cache Start: unpacking root cache Finish: unpacking root cache INFO: enabled package manager cache Start: cleaning package manager metadata Finish: cleaning package manager metadata INFO: enabled HW Info plugin INFO: Buildroot is handled by package management downloaded with a bootstrap image: rpm-4.19.1.1-1.fc39.aarch64 rpm-sequoia-1.6.0-1.fc39.aarch64 python3-dnf-4.19.2-1.fc39.noarch python3-dnf-plugins-core-4.6.0-1.fc39.noarch yum-4.19.2-1.fc39.noarch Finish: chroot init Start: build phase for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm Start: build setup for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: aarch64 Building for target aarch64 setting SOURCE_DATE_EPOCH=1554595200 Wrote: /builddir/build/SRPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) No matches found for the following disable plugin patterns: local, spacewalk, versionlock Copr repository 65 kB/s | 1.5 kB 00:00 Additional repo copr_rezso_CUDA 79 kB/s | 1.5 kB 00:00 Additional repo http_developer_download_nvidia_ 708 kB/s | 3.5 kB 00:00 Additional repo http_developer_download_nvidia_ 815 kB/s | 3.5 kB 00:00 Additional repo http_developer_download_nvidia_ 822 kB/s | 3.5 kB 00:00 fedora 95 kB/s | 14 kB 00:00 updates 385 kB/s | 14 kB 00:00 Dependencies resolved. ===================================================================================================================================================================== Package Arch Version Repository Size ===================================================================================================================================================================== Installing: asmjit-devel aarch64 1:0-20220702.1.gitc5984762.fc39 copr_base 230 k cpuinfo-devel aarch64 1:0-20240327.0.gitf42f5eaf.fc39 copr_base 24 k cuda-cudart-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 2.0 M cuda-cupti-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 15 M cuda-driver-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 43 k cuda-gcc-12-c++ aarch64 12.3.1-1.fc39 copr_base 14 M cuda-nvcc-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 59 M cuda-nvml-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 119 k cuda-nvrtc-devel-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 22 M cuda-nvtx-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 89 k cuda-profiler-api-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 26 k cutlass-devel aarch64 3.4.1-20240215.0.cu12_3.fc39 copr_base 774 k doxygen aarch64 2:1.9.7-3.fc39 fedora 4.8 M eigen3-devel noarch 3.4.0-12.fc39 fedora 1.2 M fftw-devel aarch64 3.3.10-10.fc39 updates 133 k flatbuffers-compiler aarch64 23.5.26-3.fc39 fedora 926 k flatbuffers-devel aarch64 23.5.26-3.fc39 fedora 111 k foxi-devel aarch64 0-20210526.1.gitc278588e.fc37 copr_base 25 k fp16-devel aarch64 1:0-20240410.0.git581ac1c7.fc39 copr_base 13 k fxdiv-devel noarch 1:0-20201208.1.git63058eff.fc39 copr_base 12 k gcc-c++ aarch64 13.2.1-7.fc39 updates 12 M gemmlowp-devel noarch 0-20231104.0.git16e8662c.fc39 copr_base 157 k gflags-devel aarch64 2.2.2-12.fc39 fedora 24 k git aarch64 2.44.0-1.fc39 updates 53 k glog-devel aarch64 0.3.5-18.fc39 fedora 38 k gloo-devel aarch64 1:0.5.0-20240302.0.git2565674c.cu12_3.fc39 copr_base 74 k gmp-devel aarch64 1:6.2.1-5.fc39 fedora 174 k hiredis-devel aarch64 1.0.2-5.fc39 fedora 37 k kineto-devel aarch64 0.4.0-20240327.0.git445909a8.cu12_3.fc39 copr_base 23 k leveldb-devel aarch64 1.23-7.fc39 fedora 53 k libcublas-devel-12-3 aarch64 12.3.4.1-2 copr_rezso_CUDA 75 k libcudnn8-devel aarch64 8.9.7.29-2.cuda12.3 copr_rezso_CUDA 34 k libcufft-devel-12-3 aarch64 11.0.12.1-2 copr_rezso_CUDA 33 k libcurand-devel-12-3 aarch64 10.3.4.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 53 M libcusolver-devel-12-3 aarch64 11.5.4.101-2 copr_rezso_CUDA 61 k libcusparse-devel-12-3 aarch64 12.2.0.103-2 copr_rezso_CUDA 108 M libnccl-devel aarch64 2.21.5-1+cuda12.4 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 16 k libnvjitlink-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 17 M libuv-devel aarch64 1:1.48.0-1.fc39 updates 42 k libzstd-devel aarch64 1.5.6-1.fc39 updates 52 k lmdb-devel aarch64 0.9.32-1.fc39 updates 26 k magma-devel aarch64 2.8.0-20240328.0.cu12_3.fc39 copr_base 985 k mesa-libGLU-devel aarch64 9.0.3-1.fc39 fedora 12 k miniz-devel aarch64 3.0.2-3.fc39 fedora 33 k mpfr-devel aarch64 4.2.0-3.fc39 fedora 22 k neon2sse-devel noarch 0-20230131.0.git097a5eca.fc38 copr_base 85 k nnpack-devel aarch64 0-20230201.0.git70a77f48.fc38 copr_base 16 k numactl-devel aarch64 2.0.16-3.fc39 fedora 22 k ocl-icd-devel aarch64 2.3.2-2.fc39 fedora 58 k onnx-devel aarch64 1.17.0-20240404.0.git4128a090.fc39 copr_base 129 k onnx-optimizer-devel aarch64 0.3.19-20240303.0.gitb3a46118.fc39 copr_base 50 k openblas-devel aarch64 0.3.21-6.fc39 fedora 80 k openblas-openmp aarch64 0.3.21-6.fc39 fedora 3.7 M opencv-devel aarch64 4.9.0-20231227.1.cu12_3.fc39 copr_base 1.3 M peachpy-python3 noarch 0-20221113.1.git349e8f83.fc39 copr_base 674 k protobuf-compat-compiler aarch64 3.21.9-2.fc39 copr_base 834 k protobuf-compat-devel aarch64 3.21.9-2.fc39 copr_base 374 k psimd-devel noarch 1:0-20200517.2.git072586a7.fc39 copr_base 13 k pthreadpool-devel aarch64 1:0.1-20240121.0.git178e3e06.fc39 copr_base 15 k pybind11-devel aarch64 2.11.1-1.fc39 fedora 176 k python3-devel aarch64 3.12.2-2.fc39 updates 312 k python3-numpy aarch64 1:1.24.4-2.fc39 fedora 7.2 M python3-pybind11 aarch64 2.11.1-1.fc39 fedora 198 k python3-pyyaml aarch64 6.0.1-11.fc39 fedora 223 k python3-setuptools noarch 67.7.2-7.fc39 fedora 1.5 M python3-six noarch 1.16.0-12.fc39 fedora 41 k python3-typing-extensions noarch 4.8.0-1.fc39 fedora 75 k qnnpack-devel aarch64 0-20190828.2.git7d2a4e99.fc38 copr_base 12 k rdma-core-devel aarch64 46.0-4.fc39 fedora 429 k rocksdb-devel aarch64 8.1.1-2.fc39 fedora 292 k sleef-devel aarch64 3.6-20240320.0.git60e76d2b.fc39 copr_base 24 k snappy-devel aarch64 1.1.10-2.fc39 fedora 22 k tbb-devel aarch64 2020.3-20.fc39 fedora 335 k tensorpipe-devel aarch64 0-20220513.1.gitbb1473a4.fc37 copr_base 109 k zeromq-devel aarch64 4.3.4-8.fc39 fedora 16 k Installing dependencies: Lmod aarch64 8.7.32-1.fc39 fedora 262 k MUMPS aarch64 5.5.1-5.fc39 fedora 1.8 M MUMPS-common noarch 5.5.1-5.fc39 fedora 830 k SuperLU aarch64 6.0.0-1.fc39 fedora 172 k abattis-cantarell-vf-fonts noarch 0.301-10.fc39 fedora 121 k adobe-mappings-cmap noarch 20230622-1.fc39 fedora 2.1 M adobe-mappings-cmap-deprecated noarch 20230622-1.fc39 fedora 113 k adobe-mappings-pdf noarch 20190401-5.fc39 fedora 698 k alsa-lib aarch64 1.2.11-2.fc39 updates 510 k annobin-docs noarch 12.46-1.fc39 updates 88 k annobin-plugin-gcc aarch64 12.46-1.fc39 updates 958 k armadillo aarch64 12.8.1-1.fc39 updates 31 k arpack aarch64 3.9.1-1.fc39 updates 178 k asmjit aarch64 1:0-20220702.1.gitc5984762.fc39 copr_base 204 k avahi-libs aarch64 0.8-24.fc39 fedora 67 k blosc aarch64 1.21.5-2.fc39 updates 48 k byte-buddy noarch 1.14.2-2.fc39 fedora 3.2 M byte-buddy-agent noarch 1.14.2-2.fc39 fedora 215 k cairo aarch64 1.18.0-1.fc39 fedora 692 k cairo-gobject aarch64 1.18.0-1.fc39 fedora 18 k cdparanoia-libs aarch64 10.2-42.fc39 fedora 53 k ceres-solver aarch64 2.1.0-6.fc39 fedora 657 k cfitsio aarch64 4.3.0-1.fc39 fedora 585 k cgnslib-libs aarch64 4.4.0-2.fc39 fedora 294 k cjson aarch64 1.7.15-2.fc39 fedora 32 k clang16-libs aarch64 16.0.6-3.fc39 fedora 21 M clang16-resource-filesystem aarch64 16.0.6-3.fc39 fedora 13 k cliquer-libs aarch64 1.22-6.fc39 fedora 38 k cmake aarch64 3.27.7-1.fc39 fedora 7.4 M cmake-data noarch 3.27.7-1.fc39 fedora 2.2 M cmake-filesystem aarch64 3.27.7-1.fc39 fedora 19 k cmake-rpm-macros noarch 3.27.7-1.fc39 fedora 18 k codec2 aarch64 1.2.0-2.fc39 fedora 636 k coin-or-Cbc aarch64 2.10.5-13.fc39 fedora 765 k coin-or-Cgl aarch64 0.60.3-10.fc39 fedora 387 k coin-or-Clp aarch64 1.17.6-13.fc39 fedora 843 k coin-or-CoinUtils aarch64 2.11.4-10.fc39 fedora 443 k coin-or-Osi aarch64 0.108.6-9.fc39 fedora 292 k copy-jdk-configs noarch 4.1-3.fc39 fedora 28 k cpp aarch64 13.2.1-7.fc39 updates 9.7 M cpuinfo aarch64 1:0-20240327.0.gitf42f5eaf.fc39 copr_base 47 k crypto-policies-scripts noarch 20231204-1.git1e3a2e4.fc39 updates 117 k cuda-cccl-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 1.9 M cuda-crt-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 111 k cuda-cudart-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 233 k cuda-gcc-12 aarch64 12.3.1-1.fc39 copr_base 29 M cuda-nvrtc-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 23 M cuda-nvvm-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 25 M cuda-toolkit-12-3-config-common noarch 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 7.7 k cuda-toolkit-12-config-common noarch 12.4.127-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 7.9 k cuda-toolkit-config-common noarch 12.4.127-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 7.9 k cups-libs aarch64 1:2.4.7-11.fc39 updates 268 k cutlass aarch64 3.4.1-20240215.0.cu12_3.fc39 copr_base 179 M dbus aarch64 1:1.14.10-1.fc39 fedora 8.1 k dbus-broker aarch64 35-2.fc39 updates 172 k dbus-common noarch 1:1.14.10-1.fc39 fedora 15 k dbus-libs aarch64 1:1.14.10-1.fc39 fedora 156 k default-fonts-core-sans noarch 4.0-9.fc39 fedora 32 k double-conversion aarch64 3.1.5-9.fc39 fedora 46 k duktape aarch64 2.7.0-5.fc39 fedora 170 k emacs-filesystem noarch 1:29.3-1.fc39 updates 7.2 k expat aarch64 2.6.2-1.fc39 updates 112 k fdk-aac-free aarch64 2.0.0-11.fc39 fedora 326 k fftw aarch64 3.3.10-10.fc39 updates 40 k fftw-libs aarch64 3.3.10-10.fc39 updates 8.0 k fftw-libs-double aarch64 3.3.10-10.fc39 updates 835 k fftw-libs-long aarch64 3.3.10-10.fc39 updates 784 k fftw-libs-single aarch64 3.3.10-10.fc39 updates 881 k flatbuffers aarch64 23.5.26-3.fc39 fedora 185 k flexiblas aarch64 3.4.2-1.fc39 updates 25 k flexiblas-netlib aarch64 3.4.2-1.fc39 updates 2.6 M flexiblas-netlib64 aarch64 3.4.2-1.fc39 updates 2.5 M flexiblas-openblas-openmp aarch64 3.4.2-1.fc39 updates 17 k flexiblas-openblas-openmp64 aarch64 3.4.2-1.fc39 updates 17 k fontconfig aarch64 2.14.2-6.fc39 updates 302 k fonts-filesystem noarch 1:2.0.5-12.fc39 fedora 8.2 k foxi aarch64 0-20210526.1.gitc278588e.fc37 copr_base 12 k fp16 aarch64 1:0-20240410.0.git581ac1c7.fc39 copr_base 12 k freetype aarch64 2.13.1-2.fc39 fedora 406 k freexl aarch64 2.0.0-2.fc39 fedora 45 k fribidi aarch64 1.0.13-2.fc39 fedora 91 k game-music-emu aarch64 0.6.3-12.fc39 fedora 151 k gc aarch64 8.2.2-4.fc39 fedora 110 k gcc aarch64 13.2.1-7.fc39 updates 31 M gcc-plugin-annobin aarch64 13.2.1-7.fc39 updates 52 k gd aarch64 2.3.3-12.fc39 fedora 133 k gdal-libs aarch64 3.7.3-4.fc39 updates 8.0 M gdk-pixbuf2 aarch64 2.42.10-5.fc39 fedora 482 k gdk-pixbuf2-modules aarch64 2.42.10-5.fc39 fedora 87 k gecode aarch64 6.2.0-12.fc39 fedora 2.9 M geos aarch64 3.12.1-1.fc39 updates 1.0 M gflags aarch64 2.2.2-12.fc39 fedora 86 k giflib aarch64 5.2.2-1.fc39 updates 52 k git-core aarch64 2.44.0-1.fc39 updates 4.6 M git-core-doc noarch 2.44.0-1.fc39 updates 2.9 M gklib aarch64 5.1.1-20230326.0.git8bd6bad7.fc39 copr_base 93 k gl-manpages noarch 1.1-28.20190306.fc39 fedora 1.2 M glib2 aarch64 2.78.3-1.fc39 updates 2.8 M glibc-devel aarch64 2.38-99.fc39 copr_base 498 k glog aarch64 0.3.5-18.fc39 fedora 65 k gloo aarch64 1:0.5.0-20240302.0.git2565674c.cu12_3.fc39 copr_base 747 k glpk aarch64 5.0-7.fc39 fedora 355 k glx-utils aarch64 9.0.0-3.fc39 fedora 79 k gmp-c++ aarch64 1:6.2.1-5.fc39 fedora 18 k gnutls aarch64 3.8.4-1.fc39 updates 1.1 M google-droid-sans-fonts noarch 20200215-17.fc39 fedora 2.7 M google-noto-fonts-common noarch 20240101-1.fc39 updates 17 k google-noto-sans-vf-fonts noarch 20240101-1.fc39 updates 593 k graphene aarch64 1.10.6-6.fc39 fedora 62 k graphite2 aarch64 1.3.14-12.fc39 fedora 93 k graphviz aarch64 8.1.0-6.fc39 updates 4.9 M groff-base aarch64 1.23.0-3.fc39 updates 1.1 M gsl aarch64 2.7.1-5.fc39 fedora 1.0 M gsm aarch64 1.0.22-3.fc39 fedora 36 k gstreamer1 aarch64 1.22.9-1.fc39 updates 1.4 M gstreamer1-plugins-base aarch64 1.22.9-1.fc39 updates 2.1 M gts aarch64 0.7.6-46.20121130.fc39 fedora 234 k guile22 aarch64 2.2.7-9.fc39 fedora 6.5 M halide aarch64 17.0.1-20240220.0.fc39 copr_base 20 M harfbuzz aarch64 8.2.1-2.fc39 fedora 934 k hdf-libs aarch64 4.2.15-13.fc39 fedora 279 k hdf5 aarch64 1.12.1-12.fc39 fedora 2.1 M highway aarch64 1.1.0-1.fc39 updates 97 k hiredis aarch64 1.0.2-5.fc39 fedora 42 k ilbc aarch64 3.0.4-7.fc39 fedora 52 k imath aarch64 3.1.10-1.fc39 updates 93 k infiniband-diags aarch64 46.0-4.fc39 fedora 336 k isl aarch64 0.16.1-18.fc39 fedora 838 k iso-codes noarch 4.15.0-2.fc39 fedora 3.5 M jacop noarch 4.9.0-2.fc39 fedora 1.7 M java-17-openjdk-headless aarch64 1:17.0.9.0.9-3.fc39 updates 44 M javapackages-filesystem noarch 6.1.0-10.fc39 fedora 12 k javapackages-tools noarch 6.1.0-10.fc39 fedora 37 k jbig2dec-libs aarch64 0.19-10.fc39 fedora 71 k jbigkit-libs aarch64 2.1-26.fc39 fedora 53 k json-c aarch64 0.17-1.fc39 fedora 44 k jsoncpp aarch64 1.9.5-5.fc39 fedora 91 k kernel-headers aarch64 6.8.3-200.fc39 updates 1.6 M keyutils-libs-devel aarch64 1.6.3-1.fc39 updates 60 k kineto aarch64 0.4.0-20240327.0.git445909a8.cu12_3.fc39 copr_base 278 k kmod-libs aarch64 30-6.fc39 fedora 67 k krb5-devel aarch64 1.21.2-3.fc39 updates 144 k lame-libs aarch64 3.100-15.fc39 fedora 335 k lasi aarch64 1.1.3-11.fc39 fedora 53 k lcms2 aarch64 2.15-2.fc39 fedora 176 k less aarch64 633-2.fc39 fedora 176 k leveldb aarch64 1.23-7.fc39 fedora 146 k libGLEW aarch64 2.2.0-5.fc39 fedora 177 k libICE aarch64 1.0.10-11.fc39 fedora 70 k libSM aarch64 1.2.3-13.fc39 fedora 41 k libX11 aarch64 1.8.7-1.fc39 fedora 644 k libX11-common noarch 1.8.7-1.fc39 fedora 176 k libX11-devel aarch64 1.8.7-1.fc39 fedora 1.0 M libX11-xcb aarch64 1.8.7-1.fc39 fedora 12 k libXau aarch64 1.0.11-3.fc39 fedora 32 k libXau-devel aarch64 1.0.11-3.fc39 fedora 14 k libXcursor aarch64 1.2.1-4.fc39 fedora 30 k libXext aarch64 1.3.5-3.fc39 fedora 39 k libXfixes aarch64 6.0.0-6.fc39 fedora 19 k libXft aarch64 2.3.8-3.fc39 fedora 71 k libXi aarch64 1.8.1-2.fc39 fedora 39 k libXpm aarch64 3.5.17-1.fc39 updates 64 k libXrender aarch64 0.9.11-3.fc39 fedora 27 k libXt aarch64 1.2.1-5.fc39 fedora 176 k libXv aarch64 1.0.11-19.fc39 fedora 18 k libXxf86vm aarch64 1.1.5-3.fc39 fedora 18 k libaec aarch64 1.1.2-1.fc39 updates 36 k libaom aarch64 3.8.2-1.fc39 updates 1.5 M libarrow aarch64 13.0.0-4.fc39 updates 4.3 M libarrow-doc noarch 13.0.0-4.fc39 updates 27 k libasan aarch64 13.2.1-7.fc39 updates 455 k libatomic aarch64 13.2.1-7.fc39 updates 42 k libavcodec-free aarch64 6.1.1-3.fc39 updates 3.9 M libavformat-free aarch64 6.1.1-3.fc39 updates 1.1 M libavif aarch64 0.11.1-11.fc39 fedora 80 k libavutil-free aarch64 6.1.1-3.fc39 updates 344 k libb2 aarch64 0.98.1-9.fc39 fedora 24 k libbluray aarch64 1.3.4-3.fc39 fedora 162 k libcbor aarch64 0.10.2-2.fc39 fedora 57 k libchromaprint aarch64 1.5.1-13.fc39 fedora 39 k libcom_err-devel aarch64 1.47.0-2.fc39 fedora 16 k libcublas-12-3 aarch64 12.3.4.1-2 copr_rezso_CUDA 245 M libcudnn8 aarch64 8.9.7.29-2.cuda12.3 copr_rezso_CUDA 446 M libcufft-12-3 aarch64 11.0.12.1-2 copr_rezso_CUDA 60 M libcurand-12-3 aarch64 10.3.4.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 53 M libcusolver-12-3 aarch64 11.5.4.101-2 copr_rezso_CUDA 76 M libcusparse-12-3 aarch64 12.2.0.103-2 copr_rezso_CUDA 108 M libdatrie aarch64 0.2.13-7.fc39 fedora 32 k libdav1d aarch64 1.2.1-2.fc39 fedora 350 k libdc1394 aarch64 2.2.7-3.fc39 fedora 130 k libdeflate aarch64 1.20-1.fc39 updates 62 k libdrm aarch64 2.4.120-1.fc39 updates 131 k libedit aarch64 3.1-48.20230828cvs.fc39 fedora 107 k libevdev aarch64 1.13.1-2.fc39 fedora 42 k libfido2 aarch64 1.13.0-3.fc39 fedora 96 k libgcrypt aarch64 1.10.2-2.fc39 fedora 451 k libgeotiff aarch64 1.7.1-9.fc39 fedora 104 k libgfortran aarch64 13.2.1-7.fc39 updates 438 k libglvnd aarch64 1:1.7.0-1.fc39 fedora 126 k libglvnd-core-devel aarch64 1:1.7.0-1.fc39 fedora 17 k libglvnd-devel aarch64 1:1.7.0-1.fc39 fedora 162 k libglvnd-egl aarch64 1:1.7.0-1.fc39 fedora 37 k libglvnd-gles aarch64 1:1.7.0-1.fc39 fedora 32 k libglvnd-glx aarch64 1:1.7.0-1.fc39 fedora 138 k libglvnd-opengl aarch64 1:1.7.0-1.fc39 fedora 44 k libgpg-error aarch64 1.47-2.fc39 fedora 230 k libgs aarch64 10.02.1-2.fc39 updates 3.4 M libgta aarch64 1.2.1-10.fc39 fedora 35 k libgudev aarch64 238-2.fc39 fedora 34 k libharu aarch64 2.4.3-3.fc39 fedora 580 k libibumad aarch64 46.0-4.fc39 fedora 27 k libibverbs aarch64 46.0-4.fc39 fedora 430 k libicu aarch64 73.2-2.fc39 fedora 10 M libijs aarch64 0.35-19.fc39 fedora 29 k libimagequant aarch64 4.0.3-2.fc39 updates 304 k libinput aarch64 1.25.0-4.fc39 updates 209 k libjpeg-turbo aarch64 2.1.4-3.fc39 fedora 196 k libjxl aarch64 1:0.8.2-3.fc39 fedora 777 k libkadm5 aarch64 1.21.2-3.fc39 updates 78 k libkml aarch64 1.3.0-45.fc39 fedora 331 k libldb aarch64 2.8.0-1.fc39 fedora 185 k liblerc aarch64 4.0.0-4.fc39 fedora 179 k libmodplug aarch64 1:0.8.9.0-17.fc39 fedora 170 k libmpc aarch64 1.3.1-3.fc39 fedora 72 k libnauty aarch64 2.8.8-1.fc39 updates 707 k libnccl aarch64 2.21.5-1+cuda12.4 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 130 M libnl3 aarch64 3.9.0-1.fc39 updates 345 k libnpp-12-3 aarch64 12.2.3.2-2 copr_rezso_CUDA 96 M libnvjitlink-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 19 M libogg aarch64 2:1.3.5-6.fc39 fedora 33 k libopenmpt aarch64 0.6.12-1.fc39 updates 603 k liborc1 aarch64 1.9.3-1.fc39 updates 448 k libpaper aarch64 1:2.1.1-1.fc39 fedora 27 k libpng aarch64 2:1.6.37-15.fc39 fedora 115 k libpq aarch64 15.3-1.fc39 fedora 212 k libproxy aarch64 0.5.3-3.fc39 updates 48 k libqhull_r aarch64 1:7.2.1-13.fc39 fedora 164 k librabbitmq aarch64 0.13.0-3.fc39 fedora 44 k libraw1394 aarch64 2.1.2-18.fc39 fedora 65 k librdmacm aarch64 46.0-4.fc39 fedora 72 k librist aarch64 0.2.7-2.fc39 fedora 78 k librsvg2 aarch64 2.57.1-1.fc39 updates 1.5 M librttopo aarch64 1.1.0-12.fc39 fedora 203 k libseccomp aarch64 2.5.3-6.fc39 fedora 72 k libselinux-devel aarch64 3.5-5.fc39 fedora 151 k libsepol-devel aarch64 3.5-2.fc39 fedora 49 k libsmbclient aarch64 2:4.19.5-1.fc39 updates 81 k libsodium aarch64 1.0.18-15.fc39 updates 121 k libsodium-devel aarch64 1.0.18-15.fc39 updates 1.1 M libspatialite aarch64 5.0.1-23.fc39 fedora 2.8 M libstdc++-devel aarch64 13.2.1-7.fc39 updates 2.6 M libswresample-free aarch64 6.1.1-3.fc39 updates 63 k libswscale-free aarch64 6.1.1-3.fc39 updates 166 k libtalloc aarch64 2.4.1-1.fc39 fedora 30 k libtdb aarch64 1.4.9-1.fc39 fedora 52 k libtevent aarch64 0.15.0-1.fc39 fedora 45 k libthai aarch64 0.1.29-6.fc39 fedora 213 k libtheora aarch64 1:1.1.1-34.fc39 fedora 163 k libtiff aarch64 4.4.0-8.fc39 fedora 196 k libtool-ltdl aarch64 2.4.7-7.fc39 fedora 36 k libubsan aarch64 13.2.1-7.fc39 updates 209 k libudfread aarch64 1.1.2-6.fc39 fedora 33 k libunwind aarch64 1.7.0-0.2.rc2.fc39 fedora 72 k libunwind-devel aarch64 1.7.0-0.2.rc2.fc39 fedora 91 k liburing aarch64 2.5-1.fc39 updates 40 k libusb1 aarch64 1.0.27-1.fc39 updates 76 k libuv aarch64 1:1.48.0-1.fc39 updates 249 k libuv-static aarch64 1:1.48.0-1.fc39 updates 107 k libva aarch64 2.20.0-2.fc39 updates 108 k libvdpau aarch64 1.5-4.fc39 fedora 17 k libverto-devel aarch64 0.3.2-6.fc39 fedora 14 k libvisual aarch64 1:0.4.1-2.fc39 fedora 144 k libvorbis aarch64 1:1.3.7-8.fc39 fedora 191 k libvpx aarch64 1.13.1-1.fc39 updates 1.1 M libwacom aarch64 2.10.0-1.fc39 updates 43 k libwacom-data noarch 2.10.0-1.fc39 updates 196 k libwayland-client aarch64 1.22.0-2.fc39 fedora 33 k libwayland-cursor aarch64 1.22.0-2.fc39 fedora 19 k libwayland-egl aarch64 1.22.0-2.fc39 fedora 13 k libwayland-server aarch64 1.22.0-2.fc39 fedora 42 k libwbclient aarch64 2:4.19.5-1.fc39 updates 49 k libwebp aarch64 1.3.2-2.fc39 fedora 243 k libxcb aarch64 1.13.1-12.fc39 fedora 238 k libxcb-devel aarch64 1.13.1-12.fc39 fedora 1.4 M libxcrypt-devel aarch64 4.4.36-2.fc39 fedora 30 k libxkbcommon aarch64 1.6.0-1.fc39 updates 143 k libxkbcommon-x11 aarch64 1.6.0-1.fc39 updates 21 k libxshmfence aarch64 1.3-13.fc39 fedora 12 k libyaml aarch64 0.2.5-12.fc39 fedora 59 k lksctp-tools aarch64 1.0.19-4.fc39 fedora 93 k llvm-libs aarch64 17.0.6-3.fc39 updates 26 M llvm16-libs aarch64 16.0.6-5.fc39 fedora 25 M lmdb aarch64 0.9.32-1.fc39 updates 33 k lmdb-libs aarch64 0.9.32-1.fc39 updates 61 k lpcnetfreedv aarch64 0.5-3.fc39 fedora 7.3 M lua aarch64 5.4.6-3.fc39 fedora 189 k lua-filesystem aarch64 1.8.0-9.fc39 fedora 34 k lua-json noarch 1.3.4-4.fc39 fedora 30 k lua-lpeg aarch64 1.0.2-11.fc39 fedora 66 k lua-posix aarch64 36.2.1-3.fc39 fedora 147 k lua-term aarch64 0.07-18.fc39 fedora 15 k magma aarch64 2.8.0-20240328.0.cu12_3.fc39 copr_base 119 M make aarch64 1:4.4.1-2.fc39 fedora 585 k mariadb-connector-c aarch64 3.3.8-1.fc39 updates 214 k mariadb-connector-c-config noarch 3.3.8-1.fc39 updates 8.6 k mbedtls aarch64 2.28.7-1.fc39 updates 401 k mesa-filesystem aarch64 23.3.6-1.fc39 updates 19 k mesa-libEGL aarch64 23.3.6-1.fc39 updates 134 k mesa-libGL aarch64 23.3.6-1.fc39 updates 188 k mesa-libGLU aarch64 9.0.3-1.fc39 fedora 148 k mesa-libgbm aarch64 23.3.6-1.fc39 updates 47 k mesa-libglapi aarch64 23.3.6-1.fc39 updates 67 k metis aarch64 5.2.1-20230403.0.gite0f1b88b.fc39 copr_base 174 k miniz aarch64 3.0.2-3.fc39 fedora 65 k minizip-ng aarch64 3.0.7-5.fc39 updates 70 k mockito noarch 3.12.4-7.fc39 fedora 582 k mp aarch64 3.1.0-42.20200303git7fd4828.fc39 fedora 925 k mpdecimal aarch64 2.5.1-7.fc39 fedora 90 k mpg123-libs aarch64 1.31.3-2.fc39 fedora 347 k mtdev aarch64 1.1.6-6.fc39 fedora 20 k ncurses aarch64 6.4-7.20230520.fc39.1 updates 414 k netcdf aarch64 4.9.0-5.fc38 fedora 819 k netpbm aarch64 11.02.00-2.fc39 fedora 183 k nettle aarch64 3.9.1-2.fc39 fedora 434 k nnpack aarch64 0-20230201.0.git70a77f48.fc38 copr_base 81 k nspr aarch64 4.35.0-18.fc39 updates 135 k nss aarch64 3.98.0-1.fc39 updates 696 k nss-softokn aarch64 3.98.0-1.fc39 updates 415 k nss-softokn-freebl aarch64 3.98.0-1.fc39 updates 338 k nss-sysinit aarch64 3.98.0-1.fc39 updates 18 k nss-util aarch64 3.98.0-1.fc39 updates 86 k numactl-libs aarch64 2.0.16-3.fc39 fedora 30 k objectweb-asm noarch 9.5-2.fc39 fedora 360 k objenesis noarch 3.3-3.fc39 fedora 116 k ocl-icd aarch64 2.3.2-2.fc39 fedora 60 k ogdi aarch64 4.1.0-11.fc39 fedora 233 k onnx-libs aarch64 1.17.0-20240404.0.git4128a090.fc39 copr_base 779 k onnx-optimizer aarch64 0.3.19-20240303.0.gitb3a46118.fc39 copr_base 188 k openblas aarch64 0.3.21-6.fc39 fedora 35 k openblas-openmp64 aarch64 0.3.21-6.fc39 fedora 3.7 M openblas-openmp64_ aarch64 0.3.21-6.fc39 fedora 3.7 M openblas-serial aarch64 0.3.21-6.fc39 fedora 3.6 M openblas-serial64 aarch64 0.3.21-6.fc39 fedora 3.5 M openblas-serial64_ aarch64 0.3.21-6.fc39 fedora 3.5 M openblas-threads aarch64 0.3.21-6.fc39 fedora 3.7 M openblas-threads64 aarch64 0.3.21-6.fc39 fedora 3.7 M openblas-threads64_ aarch64 0.3.21-6.fc39 fedora 3.6 M opencl-headers noarch 3.0-19.20231212git2368105.fc39 updates 89 k opencore-amr aarch64 0.1.6-4.fc39 fedora 173 k opencv aarch64 4.9.0-20231227.1.cu12_3.fc39 copr_base 4.2 M opencv-contrib aarch64 4.9.0-20231227.1.cu12_3.fc39 copr_base 5.5 M opencv-core aarch64 4.9.0-20231227.1.cu12_3.fc39 copr_base 8.9 M opencv-cuda aarch64 4.9.0-20231227.1.cu12_3.fc39 copr_base 37 M opencv-static aarch64 4.9.0-20231227.1.cu12_3.fc39 copr_base 390 k openexr-libs aarch64 3.1.10-2.fc39 fedora 1.1 M openjpeg2 aarch64 2.5.2-1.fc39 updates 176 k openpgm aarch64 5.2.122-32.fc39 fedora 172 k openpgm-devel aarch64 5.2.122-32.fc39 fedora 67 k openslide aarch64 3.4.1-24.fc39 fedora 105 k openssh aarch64 9.3p1-10.fc39 updates 431 k openssh-clients aarch64 9.3p1-10.fc39 updates 729 k opentest4j noarch 1.2.0-14.fc39 fedora 24 k opus aarch64 1.3.1-13.fc39 fedora 205 k orc aarch64 0.4.33-3.fc39 fedora 202 k pango aarch64 1.51.0-1.fc39 fedora 339 k pcre aarch64 8.45-1.fc39.4 fedora 184 k pcre2-devel aarch64 10.42-1.fc39.2 fedora 505 k pcre2-utf16 aarch64 10.42-1.fc39.2 fedora 199 k pcre2-utf32 aarch64 10.42-1.fc39.2 fedora 187 k perl-AutoLoader noarch 5.74-502.fc39 updates 21 k perl-B aarch64 1.88-502.fc39 updates 178 k perl-Carp noarch 1.54-500.fc39 fedora 29 k perl-Class-Struct noarch 0.68-502.fc39 updates 22 k perl-Data-Dumper aarch64 2.188-501.fc39 fedora 55 k perl-Digest noarch 1.20-500.fc39 fedora 25 k perl-Digest-MD5 aarch64 2.58-500.fc39 fedora 36 k perl-DynaLoader aarch64 1.54-502.fc39 updates 26 k perl-Encode aarch64 4:3.19-500.fc39 fedora 1.7 M perl-Errno aarch64 1.37-502.fc39 updates 15 k perl-Error noarch 1:0.17029-13.fc39 fedora 40 k perl-Exporter noarch 5.77-500.fc39 fedora 31 k perl-Fcntl aarch64 1.15-502.fc39 updates 21 k perl-File-Basename noarch 2.86-502.fc39 updates 17 k perl-File-Find noarch 1.43-502.fc39 updates 25 k perl-File-Path noarch 2.18-500.fc39 fedora 35 k perl-File-Temp noarch 1:0.231.100-500.fc39 fedora 58 k perl-File-stat noarch 1.13-502.fc39 updates 17 k perl-FileHandle noarch 2.05-502.fc39 updates 16 k perl-Getopt-Long noarch 1:2.54-500.fc39 fedora 60 k perl-Getopt-Std noarch 1.13-502.fc39 updates 16 k perl-Git noarch 2.44.0-1.fc39 updates 40 k perl-HTTP-Tiny noarch 0.088-3.fc39 fedora 56 k perl-IO aarch64 1.52-502.fc39 updates 83 k perl-IO-Socket-IP noarch 0.42-1.fc39 fedora 42 k perl-IO-Socket-SSL noarch 2.083-3.fc39 fedora 225 k perl-IPC-Open3 noarch 1.22-502.fc39 updates 22 k perl-MIME-Base64 aarch64 3.16-500.fc39 fedora 30 k perl-Mozilla-CA noarch 20230801-1.fc39 fedora 13 k perl-Net-SSLeay aarch64 1.92-10.fc39 fedora 356 k perl-POSIX aarch64 2.13-502.fc39 updates 98 k perl-PathTools aarch64 3.89-500.fc39 fedora 88 k perl-Pod-Escapes noarch 1:1.07-500.fc39 fedora 20 k perl-Pod-Perldoc noarch 3.28.01-501.fc39 fedora 86 k perl-Pod-Simple noarch 1:3.45-4.fc39 fedora 218 k perl-Pod-Usage noarch 4:2.03-500.fc39 fedora 39 k perl-Scalar-List-Utils aarch64 5:1.63-500.fc39 fedora 71 k perl-SelectSaver noarch 1.02-502.fc39 updates 12 k perl-Socket aarch64 4:2.037-3.fc39 fedora 56 k perl-Storable aarch64 1:3.32-500.fc39 fedora 97 k perl-Symbol noarch 1.09-502.fc39 updates 14 k perl-Term-ANSIColor noarch 5.01-501.fc39 fedora 47 k perl-Term-Cap noarch 1.18-500.fc39 fedora 22 k perl-TermReadKey aarch64 2.38-18.fc39 fedora 35 k perl-Text-ParseWords noarch 3.31-500.fc39 fedora 16 k perl-Text-Tabs+Wrap noarch 2023.0511-3.fc39 fedora 22 k perl-Time-Local noarch 2:1.350-3.fc39 fedora 34 k perl-URI noarch 5.21-1.fc39 fedora 125 k perl-base noarch 2.27-502.fc39 updates 16 k perl-constant noarch 1.33-501.fc39 fedora 22 k perl-if noarch 0.61.000-502.fc39 updates 14 k perl-interpreter aarch64 4:5.38.2-502.fc39 updates 72 k perl-lib aarch64 0.65-502.fc39 updates 15 k perl-libnet noarch 3.15-501.fc39 fedora 129 k perl-libs aarch64 4:5.38.2-502.fc39 updates 2.3 M perl-locale noarch 1.10-502.fc39 updates 14 k perl-mro aarch64 1.28-502.fc39 updates 29 k perl-overload noarch 1.37-502.fc39 updates 46 k perl-overloading noarch 0.02-502.fc39 updates 13 k perl-parent noarch 1:0.241-500.fc39 fedora 14 k perl-podlators noarch 1:5.01-500.fc39 fedora 125 k perl-vars noarch 1.05-502.fc39 updates 13 k pixman aarch64 0.42.2-2.fc39 fedora 216 k poppler aarch64 23.08.0-1.fc39 fedora 1.1 M poppler-data noarch 0.4.11-5.fc39 fedora 2.0 M poppler-glib aarch64 23.08.0-1.fc39 fedora 178 k procps-ng aarch64 4.0.3-5.fc39 updates 373 k proj aarch64 9.2.1-2.fc39 fedora 1.3 M proj-data noarch 9.2.1-2.fc39 fedora 1.3 M protobuf aarch64 3.19.6-6.fc39 fedora 923 k protobuf-compat aarch64 3.21.9-2.fc39 copr_base 989 k pthreadpool aarch64 1:0.1-20240121.0.git178e3e06.fc39 copr_base 34 k pugixml aarch64 1.13-3.fc39 fedora 96 k pyproject-rpm-macros noarch 1.12.0-1.fc39 updates 41 k python-pip-wheel noarch 23.2.1-1.fc39 fedora 1.5 M python-rpm-macros noarch 3.12-4.fc39 fedora 19 k python3 aarch64 3.12.2-2.fc39 updates 27 k python3-libs aarch64 3.12.2-2.fc39 updates 9.1 M python3-packaging noarch 23.1-4.fc39 fedora 114 k python3-rpm-generators noarch 14-7.fc39 fedora 30 k python3-rpm-macros noarch 3.12-4.fc39 fedora 14 k qnnpack aarch64 0-20190828.2.git7d2a4e99.fc38 copr_base 43 k qt-settings noarch 39.1-1.fc39 updates 9.6 k qt5-qtbase aarch64 5.15.12-5.fc39 updates 3.5 M qt5-qtbase-common noarch 5.15.12-5.fc39 updates 12 k qt5-qtbase-gui aarch64 5.15.12-5.fc39 updates 6.3 M rav1e-libs aarch64 0.7.1-1.fc39 updates 798 k re2 aarch64 1:20220601-3.fc39 fedora 187 k rhash aarch64 1.4.3-3.fc39 fedora 192 k rocksdb aarch64 8.1.1-2.fc39 fedora 2.7 M rsvg-pixbuf-loader aarch64 2.57.1-1.fc39 updates 16 k samba-client-libs aarch64 2:4.19.5-1.fc39 updates 5.6 M samba-common noarch 2:4.19.5-1.fc39 updates 152 k samba-common-libs aarch64 2:4.19.5-1.fc39 updates 115 k scotch aarch64 7.0.3-3.fc39 fedora 276 k scotch-devel aarch64 7.0.3-3.fc39 fedora 25 k shared-mime-info aarch64 2.2-4.fc39 fedora 380 k sleef aarch64 3.6-20240320.0.git60e76d2b.fc39 copr_base 494 k snappy aarch64 1.1.10-2.fc39 fedora 37 k soxr aarch64 0.1.3-14.fc39 fedora 71 k speex aarch64 1.2.0-15.fc39 fedora 64 k srt-libs aarch64 1.5.3-1.fc39 fedora 350 k suitesparse aarch64 5.13.0-3.fc39 fedora 1.0 M svt-av1-libs aarch64 1.4.1-3.fc39 fedora 1.0 M systemd aarch64 254.10-1.fc39 updates 4.6 M systemd-pam aarch64 254.10-1.fc39 updates 352 k systemd-rpm-macros noarch 254.10-1.fc39 updates 28 k tbb aarch64 2020.3-20.fc39 fedora 140 k tcl aarch64 1:8.6.12-5.fc39 fedora 1.1 M tensorpipe aarch64 0-20220513.1.gitbb1473a4.fc37 copr_base 740 k twolame-libs aarch64 0.4.0-3.fc39 fedora 69 k tzdata noarch 2024a-2.fc39 updates 715 k tzdata-java noarch 2024a-2.fc39 updates 208 k unixODBC aarch64 2.3.11-4.fc39 fedora 470 k uriparser aarch64 0.9.7-3.fc39 fedora 60 k urw-base35-bookman-fonts noarch 20200910-18.fc39 fedora 847 k urw-base35-c059-fonts noarch 20200910-18.fc39 fedora 874 k urw-base35-d050000l-fonts noarch 20200910-18.fc39 fedora 76 k urw-base35-fonts noarch 20200910-18.fc39 fedora 10 k urw-base35-fonts-common noarch 20200910-18.fc39 fedora 21 k urw-base35-gothic-fonts noarch 20200910-18.fc39 fedora 643 k urw-base35-nimbus-mono-ps-fonts noarch 20200910-18.fc39 fedora 795 k urw-base35-nimbus-roman-fonts noarch 20200910-18.fc39 fedora 856 k urw-base35-nimbus-sans-fonts noarch 20200910-18.fc39 fedora 1.3 M urw-base35-p052-fonts noarch 20200910-18.fc39 fedora 974 k urw-base35-standard-symbols-ps-fonts noarch 20200910-18.fc39 fedora 42 k urw-base35-z003-fonts noarch 20200910-18.fc39 fedora 276 k utf8proc aarch64 2.7.0-5.fc39 fedora 80 k vapoursynth-libs aarch64 63-2.fc39 fedora 323 k vim-filesystem noarch 2:9.1.264-1.fc39 updates 17 k vo-amrwbenc aarch64 0.1.3-19.fc39 fedora 77 k vtk aarch64 9.2.6-7.fc39 fedora 22 M xapian-core-libs aarch64 1.4.23-1.fc39 fedora 710 k xcb-util aarch64 0.4.1-3.fc39 fedora 19 k xcb-util-image aarch64 0.4.1-3.fc39 fedora 19 k xcb-util-keysyms aarch64 0.4.1-3.fc39 fedora 14 k xcb-util-renderutil aarch64 0.3.10-3.fc39 fedora 17 k xcb-util-wm aarch64 0.4.2-3.fc39 fedora 31 k xerces-c aarch64 3.2.5-1.fc39 updates 884 k xkeyboard-config noarch 2.40-1.fc39 updates 971 k xml-common noarch 0.6.3-61.fc39 fedora 31 k xorg-x11-proto-devel noarch 2023.2-2.fc39 fedora 298 k xvidcore aarch64 1.3.7-10.fc39 fedora 227 k zeromq aarch64 4.3.4-8.fc39 fedora 452 k zimg aarch64 3.0.5-1.fc39 updates 139 k zlib-devel aarch64 1.2.13-4.fc39 fedora 45 k zvbi aarch64 0.2.35-21.fc39 fedora 418 k Transaction Summary ===================================================================================================================================================================== Install 591 Packages Total download size: 2.4 G Installed size: 7.9 G Downloading Packages: (1/591): asmjit-0-20220702.1.gitc5984762.fc39.a 4.3 MB/s | 204 kB 00:00 (2/591): cpuinfo-0-20240327.0.gitf42f5eaf.fc39. 920 kB/s | 47 kB 00:00 (3/591): asmjit-devel-0-20220702.1.gitc5984762. 3.9 MB/s | 230 kB 00:00 (4/591): cpuinfo-devel-0-20240327.0.gitf42f5eaf 1.3 MB/s | 24 kB 00:00 (5/591): cuda-gcc-12-c++-12.3.1-1.fc39.aarch64. 95 MB/s | 14 MB 00:00 (6/591): cutlass-devel-3.4.1-20240215.0.cu12_3. 23 MB/s | 774 kB 00:00 (7/591): cuda-gcc-12-12.3.1-1.fc39.aarch64.rpm 132 MB/s | 29 MB 00:00 (8/591): foxi-0-20210526.1.gitc278588e.fc37.aar 324 kB/s | 12 kB 00:00 (9/591): fp16-0-20240410.0.git581ac1c7.fc39.aar 693 kB/s | 12 kB 00:00 (10/591): foxi-devel-0-20210526.1.gitc278588e.f 844 kB/s | 25 kB 00:00 (11/591): fp16-devel-0-20240410.0.git581ac1c7.f 412 kB/s | 13 kB 00:00 (12/591): fxdiv-devel-0-20201208.1.git63058eff. 445 kB/s | 12 kB 00:00 (13/591): gemmlowp-devel-0-20231104.0.git16e866 7.4 MB/s | 157 kB 00:00 (14/591): glibc-devel-2.38-99.fc39.aarch64.rpm 134 MB/s | 498 kB 00:00 (15/591): gklib-5.1.1-20230326.0.git8bd6bad7.fc 2.7 MB/s | 93 kB 00:00 (16/591): gloo-0.5.0-20240302.0.git2565674c.cu1 18 MB/s | 747 kB 00:00 (17/591): gloo-devel-0.5.0-20240302.0.git256567 2.5 MB/s | 74 kB 00:00 (18/591): kineto-0.4.0-20240327.0.git445909a8.c 17 MB/s | 278 kB 00:00 (19/591): kineto-devel-0.4.0-20240327.0.git4459 1.1 MB/s | 23 kB 00:00 (20/591): halide-17.0.1-20240220.0.fc39.aarch64 90 MB/s | 20 MB 00:00 (21/591): magma-devel-2.8.0-20240328.0.cu12_3.f 37 MB/s | 985 kB 00:00 (22/591): metis-5.2.1-20230403.0.gite0f1b88b.fc 5.5 MB/s | 174 kB 00:00 (23/591): neon2sse-devel-0-20230131.0.git097a5e 2.8 MB/s | 85 kB 00:00 (24/591): nnpack-0-20230201.0.git70a77f48.fc38. 2.7 MB/s | 81 kB 00:00 (25/591): nnpack-devel-0-20230201.0.git70a77f48 451 kB/s | 16 kB 00:00 (26/591): onnx-devel-1.17.0-20240404.0.git4128a 9.8 MB/s | 129 kB 00:00 (27/591): onnx-libs-1.17.0-20240404.0.git4128a0 27 MB/s | 779 kB 00:00 (28/591): onnx-optimizer-0.3.19-20240303.0.gitb 5.0 MB/s | 188 kB 00:00 (29/591): onnx-optimizer-devel-0.3.19-20240303. 2.8 MB/s | 50 kB 00:00 (30/591): opencv-4.9.0-20231227.1.cu12_3.fc39.a 56 MB/s | 4.2 MB 00:00 (31/591): opencv-contrib-4.9.0-20231227.1.cu12_ 54 MB/s | 5.5 MB 00:00 (32/591): opencv-core-4.9.0-20231227.1.cu12_3.f 64 MB/s | 8.9 MB 00:00 (33/591): cutlass-3.4.1-20240215.0.cu12_3.fc39. 110 MB/s | 179 MB 00:01 (34/591): magma-2.8.0-20240328.0.cu12_3.fc39.aa 86 MB/s | 119 MB 00:01 (35/591): opencv-devel-4.9.0-20231227.1.cu12_3. 9.8 MB/s | 1.3 MB 00:00 (36/591): opencv-static-4.9.0-20231227.1.cu12_3 20 MB/s | 390 kB 00:00 (37/591): protobuf-compat-3.21.9-2.fc39.aarch64 49 MB/s | 989 kB 00:00 (38/591): peachpy-python3-0-20221113.1.git349e8 21 MB/s | 674 kB 00:00 (39/591): protobuf-compat-devel-3.21.9-2.fc39.a 49 MB/s | 374 kB 00:00 (40/591): protobuf-compat-compiler-3.21.9-2.fc3 60 MB/s | 834 kB 00:00 (41/591): psimd-devel-0-20200517.2.git072586a7. 1.0 MB/s | 13 kB 00:00 (42/591): pthreadpool-0.1-20240121.0.git178e3e0 1.7 MB/s | 34 kB 00:00 (43/591): pthreadpool-devel-0.1-20240121.0.git1 696 kB/s | 15 kB 00:00 (44/591): qnnpack-0-20190828.2.git7d2a4e99.fc38 1.7 MB/s | 43 kB 00:00 (45/591): opencv-cuda-4.9.0-20231227.1.cu12_3.f 48 MB/s | 37 MB 00:00 (46/591): qnnpack-devel-0-20190828.2.git7d2a4e9 206 kB/s | 12 kB 00:00 (47/591): sleef-3.6-20240320.0.git60e76d2b.fc39 10 MB/s | 494 kB 00:00 (48/591): sleef-devel-3.6-20240320.0.git60e76d2 1.7 MB/s | 24 kB 00:00 (49/591): tensorpipe-devel-0-20220513.1.gitbb14 6.5 MB/s | 109 kB 00:00 (50/591): tensorpipe-0-20220513.1.gitbb1473a4.f 33 MB/s | 740 kB 00:00 (51/591): libcublas-devel-12-3-12.3.4.1-2.aarch 2.9 MB/s | 75 kB 00:00 (52/591): libcudnn8-devel-8.9.7.29-2.cuda12.3.a 3.0 MB/s | 34 kB 00:00 (53/591): libcufft-12-3-11.0.12.1-2.aarch64.rpm 72 MB/s | 60 MB 00:00 (54/591): libcufft-devel-12-3-11.0.12.1-2.aarch 995 kB/s | 33 kB 00:00 (55/591): libcusolver-12-3-11.5.4.101-2.aarch64 102 MB/s | 76 MB 00:00 (56/591): libcusolver-devel-12-3-11.5.4.101-2.a 1.5 MB/s | 61 kB 00:00 (57/591): libcublas-12-3-12.3.4.1-2.aarch64.rpm 93 MB/s | 245 MB 00:02 (58/591): libcusparse-12-3-12.2.0.103-2.aarch64 65 MB/s | 108 MB 00:01 (59/591): libcusparse-devel-12-3-12.2.0.103-2.a 70 MB/s | 108 MB 00:01 (60/591): cuda-toolkit-12-3-config-common-12.3. 1.6 MB/s | 7.7 kB 00:00 (61/591): cuda-toolkit-12-config-common-12.4.12 3.1 MB/s | 7.9 kB 00:00 (62/591): cuda-toolkit-config-common-12.4.127-1 1.6 MB/s | 7.9 kB 00:00 (63/591): cuda-cccl-12-3-12.3.101-1.aarch64.rpm 142 MB/s | 1.9 MB 00:00 (64/591): cuda-crt-12-3-12.3.107-1.aarch64.rpm 29 MB/s | 111 kB 00:00 (65/591): cuda-cudart-12-3-12.3.101-1.aarch64.r 63 MB/s | 233 kB 00:00 (66/591): cuda-cudart-devel-12-3-12.3.101-1.aar 136 MB/s | 2.0 MB 00:00 (67/591): libnpp-12-3-12.2.3.2-2.aarch64.rpm 86 MB/s | 96 MB 00:01 (68/591): cuda-cupti-12-3-12.3.101-1.aarch64.rp 57 MB/s | 15 MB 00:00 (69/591): cuda-driver-devel-12-3-12.3.101-1.aar 581 kB/s | 43 kB 00:00 (70/591): cuda-nvml-devel-12-3-12.3.101-1.aarch 25 MB/s | 119 kB 00:00 (71/591): libcudnn8-8.9.7.29-2.cuda12.3.aarch64 89 MB/s | 446 MB 00:05 (72/591): cuda-nvrtc-12-3-12.3.107-1.aarch64.rp 48 MB/s | 23 MB 00:00 (73/591): cuda-nvtx-12-3-12.3.101-1.aarch64.rpm 1.3 MB/s | 89 kB 00:00 (74/591): cuda-nvcc-12-3-12.3.107-1.aarch64.rpm 85 MB/s | 59 MB 00:00 (75/591): cuda-nvvm-12-3-12.3.107-1.aarch64.rpm 183 MB/s | 25 MB 00:00 (76/591): cuda-profiler-api-12-3-12.3.101-1.aar 401 kB/s | 26 kB 00:00 (77/591): cuda-nvrtc-devel-12-3-12.3.107-1.aarc 55 MB/s | 22 MB 00:00 (78/591): libnccl-2.21.5-1+cuda12.4.aarch64.rpm 246 MB/s | 130 MB 00:00 (79/591): libnccl-devel-2.21.5-1+cuda12.4.aarch 3.0 MB/s | 16 kB 00:00 (80/591): libcurand-12-3-10.3.4.107-1.aarch64.r 63 MB/s | 53 MB 00:00 (81/591): libcurand-devel-12-3-10.3.4.107-1.aar 60 MB/s | 53 MB 00:00 (82/591): Lmod-8.7.32-1.fc39.aarch64.rpm 15 MB/s | 262 kB 00:00 (83/591): MUMPS-5.5.1-5.fc39.aarch64.rpm 64 MB/s | 1.8 MB 00:00 (84/591): MUMPS-common-5.5.1-5.fc39.noarch.rpm 76 MB/s | 830 kB 00:00 (85/591): SuperLU-6.0.0-1.fc39.aarch64.rpm 49 MB/s | 172 kB 00:00 (86/591): abattis-cantarell-vf-fonts-0.301-10.f 44 MB/s | 121 kB 00:00 (87/591): adobe-mappings-cmap-20230622-1.fc39.n 219 MB/s | 2.1 MB 00:00 (88/591): adobe-mappings-cmap-deprecated-202306 40 MB/s | 113 kB 00:00 (89/591): adobe-mappings-pdf-20190401-5.fc39.no 127 MB/s | 698 kB 00:00 (90/591): avahi-libs-0.8-24.fc39.aarch64.rpm 27 MB/s | 67 kB 00:00 (91/591): byte-buddy-1.14.2-2.fc39.noarch.rpm 150 MB/s | 3.2 MB 00:00 (92/591): byte-buddy-agent-1.14.2-2.fc39.noarch 34 MB/s | 215 kB 00:00 (93/591): cairo-1.18.0-1.fc39.aarch64.rpm 109 MB/s | 692 kB 00:00 (94/591): cairo-gobject-1.18.0-1.fc39.aarch64.r 5.5 MB/s | 18 kB 00:00 (95/591): cdparanoia-libs-10.2-42.fc39.aarch64. 14 MB/s | 53 kB 00:00 (96/591): libnvjitlink-12-3-12.3.101-1.aarch64. 48 MB/s | 19 MB 00:00 (97/591): ceres-solver-2.1.0-6.fc39.aarch64.rpm 9.4 MB/s | 657 kB 00:00 (98/591): cgnslib-libs-4.4.0-2.fc39.aarch64.rpm 41 MB/s | 294 kB 00:00 (99/591): cfitsio-4.3.0-1.fc39.aarch64.rpm 42 MB/s | 585 kB 00:00 (100/591): cjson-1.7.15-2.fc39.aarch64.rpm 4.7 MB/s | 32 kB 00:00 (101/591): clang16-resource-filesystem-16.0.6-3 2.5 MB/s | 13 kB 00:00 (102/591): cliquer-libs-1.22-6.fc39.aarch64.rpm 6.3 MB/s | 38 kB 00:00 (103/591): libnvjitlink-devel-12-3-12.3.101-1.a 54 MB/s | 17 MB 00:00 (104/591): cmake-data-3.27.7-1.fc39.noarch.rpm 144 MB/s | 2.2 MB 00:00 (105/591): cmake-3.27.7-1.fc39.aarch64.rpm 105 MB/s | 7.4 MB 00:00 (106/591): cmake-filesystem-3.27.7-1.fc39.aarch 1.5 MB/s | 19 kB 00:00 (107/591): cmake-rpm-macros-3.27.7-1.fc39.noarc 5.0 MB/s | 18 kB 00:00 (108/591): codec2-1.2.0-2.fc39.aarch64.rpm 53 MB/s | 636 kB 00:00 (109/591): coin-or-Cbc-2.10.5-13.fc39.aarch64.r 65 MB/s | 765 kB 00:00 (110/591): coin-or-Cgl-0.60.3-10.fc39.aarch64.r 74 MB/s | 387 kB 00:00 (111/591): coin-or-CoinUtils-2.11.4-10.fc39.aar 73 MB/s | 443 kB 00:00 (112/591): coin-or-Osi-0.108.6-9.fc39.aarch64.r 21 MB/s | 292 kB 00:00 (113/591): clang16-libs-16.0.6-3.fc39.aarch64.r 139 MB/s | 21 MB 00:00 (114/591): coin-or-Clp-1.17.6-13.fc39.aarch64.r 18 MB/s | 843 kB 00:00 (115/591): copy-jdk-configs-4.1-3.fc39.noarch.r 1.3 MB/s | 28 kB 00:00 (116/591): dbus-1.14.10-1.fc39.aarch64.rpm 4.3 MB/s | 8.1 kB 00:00 (117/591): dbus-libs-1.14.10-1.fc39.aarch64.rpm 77 MB/s | 156 kB 00:00 (118/591): dbus-common-1.14.10-1.fc39.noarch.rp 3.7 MB/s | 15 kB 00:00 (119/591): default-fonts-core-sans-4.0-9.fc39.n 8.3 MB/s | 32 kB 00:00 (120/591): double-conversion-3.1.5-9.fc39.aarch 9.2 MB/s | 46 kB 00:00 (121/591): duktape-2.7.0-5.fc39.aarch64.rpm 26 MB/s | 170 kB 00:00 (122/591): eigen3-devel-3.4.0-12.fc39.noarch.rp 120 MB/s | 1.2 MB 00:00 (123/591): fdk-aac-free-2.0.0-11.fc39.aarch64.r 38 MB/s | 326 kB 00:00 (124/591): doxygen-1.9.7-3.fc39.aarch64.rpm 192 MB/s | 4.8 MB 00:00 (125/591): flatbuffers-23.5.26-3.fc39.aarch64.r 3.6 MB/s | 185 kB 00:00 (126/591): fonts-filesystem-2.0.5-12.fc39.noarc 3.0 MB/s | 8.2 kB 00:00 (127/591): freetype-2.13.1-2.fc39.aarch64.rpm 102 MB/s | 406 kB 00:00 (128/591): freexl-2.0.0-2.fc39.aarch64.rpm 8.2 MB/s | 45 kB 00:00 (129/591): flatbuffers-devel-23.5.26-3.fc39.aar 2.0 MB/s | 111 kB 00:00 (130/591): game-music-emu-0.6.3-12.fc39.aarch64 25 MB/s | 151 kB 00:00 (131/591): fribidi-1.0.13-2.fc39.aarch64.rpm 11 MB/s | 91 kB 00:00 (132/591): flatbuffers-compiler-23.5.26-3.fc39. 13 MB/s | 926 kB 00:00 (133/591): gd-2.3.3-12.fc39.aarch64.rpm 38 MB/s | 133 kB 00:00 (134/591): gc-8.2.2-4.fc39.aarch64.rpm 13 MB/s | 110 kB 00:00 (135/591): gdk-pixbuf2-2.42.10-5.fc39.aarch64.r 53 MB/s | 482 kB 00:00 (136/591): gdk-pixbuf2-modules-2.42.10-5.fc39.a 9.8 MB/s | 87 kB 00:00 (137/591): gecode-6.2.0-12.fc39.aarch64.rpm 103 MB/s | 2.9 MB 00:00 (138/591): gl-manpages-1.1-28.20190306.fc39.noa 71 MB/s | 1.2 MB 00:00 (139/591): gflags-2.2.2-12.fc39.aarch64.rpm 1.9 MB/s | 86 kB 00:00 (140/591): gflags-devel-2.2.2-12.fc39.aarch64.r 365 kB/s | 24 kB 00:00 (141/591): glpk-5.0-7.fc39.aarch64.rpm 47 MB/s | 355 kB 00:00 (142/591): glx-utils-9.0.0-3.fc39.aarch64.rpm 15 MB/s | 79 kB 00:00 (143/591): gmp-c++-6.2.1-5.fc39.aarch64.rpm 7.2 MB/s | 18 kB 00:00 (144/591): glog-0.3.5-18.fc39.aarch64.rpm 1.4 MB/s | 65 kB 00:00 (145/591): gmp-devel-6.2.1-5.fc39.aarch64.rpm 39 MB/s | 174 kB 00:00 (146/591): graphene-1.10.6-6.fc39.aarch64.rpm 15 MB/s | 62 kB 00:00 (147/591): graphite2-1.3.14-12.fc39.aarch64.rpm 31 MB/s | 93 kB 00:00 (148/591): gsl-2.7.1-5.fc39.aarch64.rpm 96 MB/s | 1.0 MB 00:00 (149/591): gsm-1.0.22-3.fc39.aarch64.rpm 15 MB/s | 36 kB 00:00 (150/591): gts-0.7.6-46.20121130.fc39.aarch64.r 66 MB/s | 234 kB 00:00 (151/591): glog-devel-0.3.5-18.fc39.aarch64.rpm 538 kB/s | 38 kB 00:00 (152/591): harfbuzz-8.2.1-2.fc39.aarch64.rpm 83 MB/s | 934 kB 00:00 (153/591): hdf-libs-4.2.15-13.fc39.aarch64.rpm 17 MB/s | 279 kB 00:00 (154/591): guile22-2.2.7-9.fc39.aarch64.rpm 179 MB/s | 6.5 MB 00:00 (155/591): google-droid-sans-fonts-20200215-17. 39 MB/s | 2.7 MB 00:00 (156/591): hiredis-1.0.2-5.fc39.aarch64.rpm 7.3 MB/s | 42 kB 00:00 (157/591): ilbc-3.0.4-7.fc39.aarch64.rpm 12 MB/s | 52 kB 00:00 (158/591): hiredis-devel-1.0.2-5.fc39.aarch64.r 6.8 MB/s | 37 kB 00:00 (159/591): infiniband-diags-46.0-4.fc39.aarch64 63 MB/s | 336 kB 00:00 (160/591): isl-0.16.1-18.fc39.aarch64.rpm 121 MB/s | 838 kB 00:00 (161/591): hdf5-1.12.1-12.fc39.aarch64.rpm 29 MB/s | 2.1 MB 00:00 (162/591): javapackages-filesystem-6.1.0-10.fc3 1.7 MB/s | 12 kB 00:00 (163/591): javapackages-tools-6.1.0-10.fc39.noa 8.2 MB/s | 37 kB 00:00 (164/591): jbig2dec-libs-0.19-10.fc39.aarch64.r 24 MB/s | 71 kB 00:00 (165/591): jacop-4.9.0-2.fc39.noarch.rpm 25 MB/s | 1.7 MB 00:00 (166/591): jbigkit-libs-2.1-26.fc39.aarch64.rpm 12 MB/s | 53 kB 00:00 (167/591): json-c-0.17-1.fc39.aarch64.rpm 12 MB/s | 44 kB 00:00 (168/591): kmod-libs-30-6.fc39.aarch64.rpm 22 MB/s | 67 kB 00:00 (169/591): jsoncpp-1.9.5-5.fc39.aarch64.rpm 15 MB/s | 91 kB 00:00 (170/591): lasi-1.1.3-11.fc39.aarch64.rpm 14 MB/s | 53 kB 00:00 (171/591): lame-libs-3.100-15.fc39.aarch64.rpm 57 MB/s | 335 kB 00:00 (172/591): iso-codes-4.15.0-2.fc39.noarch.rpm 41 MB/s | 3.5 MB 00:00 (173/591): lcms2-2.15-2.fc39.aarch64.rpm 27 MB/s | 176 kB 00:00 (174/591): less-633-2.fc39.aarch64.rpm 21 MB/s | 176 kB 00:00 (175/591): libGLEW-2.2.0-5.fc39.aarch64.rpm 31 MB/s | 177 kB 00:00 (176/591): libICE-1.0.10-11.fc39.aarch64.rpm 30 MB/s | 70 kB 00:00 (177/591): libSM-1.2.3-13.fc39.aarch64.rpm 12 MB/s | 41 kB 00:00 (178/591): leveldb-1.23-7.fc39.aarch64.rpm 7.2 MB/s | 146 kB 00:00 (179/591): libX11-1.8.7-1.fc39.aarch64.rpm 102 MB/s | 644 kB 00:00 (180/591): libX11-common-1.8.7-1.fc39.noarch.rp 34 MB/s | 176 kB 00:00 (181/591): libX11-xcb-1.8.7-1.fc39.aarch64.rpm 2.9 MB/s | 12 kB 00:00 (182/591): libX11-devel-1.8.7-1.fc39.aarch64.rp 110 MB/s | 1.0 MB 00:00 (183/591): libXau-1.0.11-3.fc39.aarch64.rpm 13 MB/s | 32 kB 00:00 (184/591): libXau-devel-1.0.11-3.fc39.aarch64.r 4.2 MB/s | 14 kB 00:00 (185/591): libXcursor-1.2.1-4.fc39.aarch64.rpm 10 MB/s | 30 kB 00:00 (186/591): libXext-1.3.5-3.fc39.aarch64.rpm 16 MB/s | 39 kB 00:00 (187/591): libXfixes-6.0.0-6.fc39.aarch64.rpm 9.0 MB/s | 19 kB 00:00 (188/591): libXi-1.8.1-2.fc39.aarch64.rpm 18 MB/s | 39 kB 00:00 (189/591): libXft-2.3.8-3.fc39.aarch64.rpm 20 MB/s | 71 kB 00:00 (190/591): libXt-1.2.1-5.fc39.aarch64.rpm 40 MB/s | 176 kB 00:00 (191/591): libXrender-0.9.11-3.fc39.aarch64.rpm 4.3 MB/s | 27 kB 00:00 (192/591): libXv-1.0.11-19.fc39.aarch64.rpm 2.3 MB/s | 18 kB 00:00 (193/591): libXxf86vm-1.1.5-3.fc39.aarch64.rpm 2.2 MB/s | 18 kB 00:00 (194/591): libavif-0.11.1-11.fc39.aarch64.rpm 12 MB/s | 80 kB 00:00 (195/591): libb2-0.98.1-9.fc39.aarch64.rpm 1.6 MB/s | 24 kB 00:00 (196/591): leveldb-devel-1.23-7.fc39.aarch64.rp 680 kB/s | 53 kB 00:00 (197/591): libbluray-1.3.4-3.fc39.aarch64.rpm 8.3 MB/s | 162 kB 00:00 (198/591): libcbor-0.10.2-2.fc39.aarch64.rpm 4.1 MB/s | 57 kB 00:00 (199/591): libchromaprint-1.5.1-13.fc39.aarch64 5.5 MB/s | 39 kB 00:00 (200/591): libcom_err-devel-1.47.0-2.fc39.aarch 2.7 MB/s | 16 kB 00:00 (201/591): libdatrie-0.2.13-7.fc39.aarch64.rpm 8.4 MB/s | 32 kB 00:00 (202/591): libdav1d-1.2.1-2.fc39.aarch64.rpm 75 MB/s | 350 kB 00:00 (203/591): libdc1394-2.2.7-3.fc39.aarch64.rpm 32 MB/s | 130 kB 00:00 (204/591): libedit-3.1-48.20230828cvs.fc39.aarc 27 MB/s | 107 kB 00:00 (205/591): libfido2-1.13.0-3.fc39.aarch64.rpm 40 MB/s | 96 kB 00:00 (206/591): libevdev-1.13.1-2.fc39.aarch64.rpm 9.6 MB/s | 42 kB 00:00 (207/591): libgcrypt-1.10.2-2.fc39.aarch64.rpm 99 MB/s | 451 kB 00:00 (208/591): libglvnd-1.7.0-1.fc39.aarch64.rpm 35 MB/s | 126 kB 00:00 (209/591): libgeotiff-1.7.1-9.fc39.aarch64.rpm 20 MB/s | 104 kB 00:00 (210/591): libglvnd-core-devel-1.7.0-1.fc39.aar 7.2 MB/s | 17 kB 00:00 (211/591): libglvnd-egl-1.7.0-1.fc39.aarch64.rp 16 MB/s | 37 kB 00:00 (212/591): libglvnd-devel-1.7.0-1.fc39.aarch64. 44 MB/s | 162 kB 00:00 (213/591): libglvnd-gles-1.7.0-1.fc39.aarch64.r 13 MB/s | 32 kB 00:00 (214/591): libglvnd-glx-1.7.0-1.fc39.aarch64.rp 40 MB/s | 138 kB 00:00 (215/591): libglvnd-opengl-1.7.0-1.fc39.aarch64 15 MB/s | 44 kB 00:00 (216/591): libgpg-error-1.47-2.fc39.aarch64.rpm 68 MB/s | 230 kB 00:00 (217/591): libgta-1.2.1-10.fc39.aarch64.rpm 8.7 MB/s | 35 kB 00:00 (218/591): libgudev-238-2.fc39.aarch64.rpm 6.0 MB/s | 34 kB 00:00 (219/591): libharu-2.4.3-3.fc39.aarch64.rpm 77 MB/s | 580 kB 00:00 (220/591): libibumad-46.0-4.fc39.aarch64.rpm 4.9 MB/s | 27 kB 00:00 (221/591): libijs-0.35-19.fc39.aarch64.rpm 8.3 MB/s | 29 kB 00:00 (222/591): libibverbs-46.0-4.fc39.aarch64.rpm 44 MB/s | 430 kB 00:00 (223/591): libjpeg-turbo-2.1.4-3.fc39.aarch64.r 10 MB/s | 196 kB 00:00 (224/591): libjxl-0.8.2-3.fc39.aarch64.rpm 30 MB/s | 777 kB 00:00 (225/591): libldb-2.8.0-1.fc39.aarch64.rpm 12 MB/s | 185 kB 00:00 (226/591): libkml-1.3.0-45.fc39.aarch64.rpm 11 MB/s | 331 kB 00:00 (227/591): libicu-73.2-2.fc39.aarch64.rpm 164 MB/s | 10 MB 00:00 (228/591): liblerc-4.0.0-4.fc39.aarch64.rpm 11 MB/s | 179 kB 00:00 (229/591): libmodplug-0.8.9.0-17.fc39.aarch64.r 15 MB/s | 170 kB 00:00 (230/591): libmpc-1.3.1-3.fc39.aarch64.rpm 20 MB/s | 72 kB 00:00 (231/591): libpaper-2.1.1-1.fc39.aarch64.rpm 5.9 MB/s | 27 kB 00:00 (232/591): libogg-1.3.5-6.fc39.aarch64.rpm 5.6 MB/s | 33 kB 00:00 (233/591): libpng-1.6.37-15.fc39.aarch64.rpm 15 MB/s | 115 kB 00:00 (234/591): libpq-15.3-1.fc39.aarch64.rpm 24 MB/s | 212 kB 00:00 (235/591): libqhull_r-7.2.1-13.fc39.aarch64.rpm 18 MB/s | 164 kB 00:00 (236/591): librabbitmq-0.13.0-3.fc39.aarch64.rp 7.9 MB/s | 44 kB 00:00 (237/591): librdmacm-46.0-4.fc39.aarch64.rpm 27 MB/s | 72 kB 00:00 (238/591): libraw1394-2.1.2-18.fc39.aarch64.rpm 14 MB/s | 65 kB 00:00 (239/591): libseccomp-2.5.3-6.fc39.aarch64.rpm 27 MB/s | 72 kB 00:00 (240/591): librist-0.2.7-2.fc39.aarch64.rpm 14 MB/s | 78 kB 00:00 (241/591): librttopo-1.1.0-12.fc39.aarch64.rpm 40 MB/s | 203 kB 00:00 (242/591): libsepol-devel-3.5-2.fc39.aarch64.rp 11 MB/s | 49 kB 00:00 (243/591): libselinux-devel-3.5-5.fc39.aarch64. 26 MB/s | 151 kB 00:00 (244/591): libtalloc-2.4.1-1.fc39.aarch64.rpm 9.0 MB/s | 30 kB 00:00 (245/591): libtdb-1.4.9-1.fc39.aarch64.rpm 11 MB/s | 52 kB 00:00 (246/591): libthai-0.1.29-6.fc39.aarch64.rpm 32 MB/s | 213 kB 00:00 (247/591): libtevent-0.15.0-1.fc39.aarch64.rpm 4.8 MB/s | 45 kB 00:00 (248/591): libspatialite-5.0.1-23.fc39.aarch64. 116 MB/s | 2.8 MB 00:00 (249/591): libtheora-1.1.1-34.fc39.aarch64.rpm 18 MB/s | 163 kB 00:00 (250/591): libtiff-4.4.0-8.fc39.aarch64.rpm 20 MB/s | 196 kB 00:00 (251/591): libudfread-1.1.2-6.fc39.aarch64.rpm 5.2 MB/s | 33 kB 00:00 (252/591): libunwind-1.7.0-0.2.rc2.fc39.aarch64 13 MB/s | 72 kB 00:00 (253/591): libtool-ltdl-2.4.7-7.fc39.aarch64.rp 3.8 MB/s | 36 kB 00:00 (254/591): libunwind-devel-1.7.0-0.2.rc2.fc39.a 23 MB/s | 91 kB 00:00 (255/591): libvdpau-1.5-4.fc39.aarch64.rpm 4.1 MB/s | 17 kB 00:00 (256/591): libverto-devel-0.3.2-6.fc39.aarch64. 3.3 MB/s | 14 kB 00:00 (257/591): libvisual-0.4.1-2.fc39.aarch64.rpm 34 MB/s | 144 kB 00:00 (258/591): libvorbis-1.3.7-8.fc39.aarch64.rpm 45 MB/s | 191 kB 00:00 (259/591): libwayland-client-1.22.0-2.fc39.aarc 11 MB/s | 33 kB 00:00 (260/591): libwayland-cursor-1.22.0-2.fc39.aarc 9.2 MB/s | 19 kB 00:00 (261/591): libwayland-egl-1.22.0-2.fc39.aarch64 4.0 MB/s | 13 kB 00:00 (262/591): libwayland-server-1.22.0-2.fc39.aarc 12 MB/s | 42 kB 00:00 (263/591): libxcb-1.13.1-12.fc39.aarch64.rpm 82 MB/s | 238 kB 00:00 (264/591): libwebp-1.3.2-2.fc39.aarch64.rpm 36 MB/s | 243 kB 00:00 (265/591): libxcrypt-devel-4.4.36-2.fc39.aarch6 4.2 MB/s | 30 kB 00:00 (266/591): libxshmfence-1.3-13.fc39.aarch64.rpm 1.9 MB/s | 12 kB 00:00 (267/591): libxcb-devel-1.13.1-12.fc39.aarch64. 105 MB/s | 1.4 MB 00:00 (268/591): libyaml-0.2.5-12.fc39.aarch64.rpm 9.6 MB/s | 59 kB 00:00 (269/591): lksctp-tools-1.0.19-4.fc39.aarch64.r 16 MB/s | 93 kB 00:00 (270/591): lua-5.4.6-3.fc39.aarch64.rpm 33 MB/s | 189 kB 00:00 (271/591): lua-filesystem-1.8.0-9.fc39.aarch64. 6.7 MB/s | 34 kB 00:00 (272/591): lua-json-1.3.4-4.fc39.noarch.rpm 6.8 MB/s | 30 kB 00:00 (273/591): lua-lpeg-1.0.2-11.fc39.aarch64.rpm 11 MB/s | 66 kB 00:00 (274/591): lua-posix-36.2.1-3.fc39.aarch64.rpm 19 MB/s | 147 kB 00:00 (275/591): lua-term-0.07-18.fc39.aarch64.rpm 2.3 MB/s | 15 kB 00:00 (276/591): lpcnetfreedv-0.5-3.fc39.aarch64.rpm 144 MB/s | 7.3 MB 00:00 (277/591): make-4.4.1-2.fc39.aarch64.rpm 46 MB/s | 585 kB 00:00 (278/591): mesa-libGLU-9.0.3-1.fc39.aarch64.rpm 24 MB/s | 148 kB 00:00 (279/591): mesa-libGLU-devel-9.0.3-1.fc39.aarch 311 kB/s | 12 kB 00:00 (280/591): llvm16-libs-16.0.6-5.fc39.aarch64.rp 206 MB/s | 25 MB 00:00 (281/591): miniz-3.0.2-3.fc39.aarch64.rpm 1.1 MB/s | 65 kB 00:00 (282/591): mockito-3.12.4-7.fc39.noarch.rpm 71 MB/s | 582 kB 00:00 (283/591): mpdecimal-2.5.1-7.fc39.aarch64.rpm 19 MB/s | 90 kB 00:00 (284/591): mpfr-devel-4.2.0-3.fc39.aarch64.rpm 8.1 MB/s | 22 kB 00:00 (285/591): miniz-devel-3.0.2-3.fc39.aarch64.rpm 751 kB/s | 33 kB 00:00 (286/591): mpg123-libs-1.31.3-2.fc39.aarch64.rp 64 MB/s | 347 kB 00:00 (287/591): mtdev-1.1.6-6.fc39.aarch64.rpm 4.9 MB/s | 20 kB 00:00 (288/591): mp-3.1.0-42.20200303git7fd4828.fc39. 35 MB/s | 925 kB 00:00 (289/591): netcdf-4.9.0-5.fc38.aarch64.rpm 107 MB/s | 819 kB 00:00 (290/591): netpbm-11.02.00-2.fc39.aarch64.rpm 32 MB/s | 183 kB 00:00 (291/591): nettle-3.9.1-2.fc39.aarch64.rpm 49 MB/s | 434 kB 00:00 (292/591): numactl-devel-2.0.16-3.fc39.aarch64. 3.8 MB/s | 22 kB 00:00 (293/591): numactl-libs-2.0.16-3.fc39.aarch64.r 4.8 MB/s | 30 kB 00:00 (294/591): objenesis-3.3-3.fc39.noarch.rpm 22 MB/s | 116 kB 00:00 (295/591): objectweb-asm-9.5-2.fc39.noarch.rpm 54 MB/s | 360 kB 00:00 (296/591): ocl-icd-2.3.2-2.fc39.aarch64.rpm 10 MB/s | 60 kB 00:00 (297/591): openblas-0.3.21-6.fc39.aarch64.rpm 12 MB/s | 35 kB 00:00 (298/591): ogdi-4.1.0-11.fc39.aarch64.rpm 48 MB/s | 233 kB 00:00 (299/591): openblas-devel-0.3.21-6.fc39.aarch64 3.5 MB/s | 80 kB 00:00 (300/591): openblas-openmp-0.3.21-6.fc39.aarch6 149 MB/s | 3.7 MB 00:00 (301/591): openblas-openmp64-0.3.21-6.fc39.aarc 129 MB/s | 3.7 MB 00:00 (302/591): ocl-icd-devel-2.3.2-2.fc39.aarch64.r 811 kB/s | 58 kB 00:00 (303/591): openblas-serial-0.3.21-6.fc39.aarch6 149 MB/s | 3.6 MB 00:00 (304/591): openblas-openmp64_-0.3.21-6.fc39.aar 50 MB/s | 3.7 MB 00:00 (305/591): openblas-serial64_-0.3.21-6.fc39.aar 39 MB/s | 3.5 MB 00:00 (306/591): openblas-serial64-0.3.21-6.fc39.aarc 29 MB/s | 3.5 MB 00:00 (307/591): openblas-threads-0.3.21-6.fc39.aarch 23 MB/s | 3.7 MB 00:00 (308/591): opencore-amr-0.1.6-4.fc39.aarch64.rp 43 MB/s | 173 kB 00:00 (309/591): openexr-libs-3.1.10-2.fc39.aarch64.r 43 MB/s | 1.1 MB 00:00 (310/591): openpgm-5.2.122-32.fc39.aarch64.rpm 18 MB/s | 172 kB 00:00 (311/591): openpgm-devel-5.2.122-32.fc39.aarch6 1.8 MB/s | 67 kB 00:00 (312/591): openslide-3.4.1-24.fc39.aarch64.rpm 6.4 MB/s | 105 kB 00:00 (313/591): opentest4j-1.2.0-14.fc39.noarch.rpm 1.9 MB/s | 24 kB 00:00 (314/591): openblas-threads64_-0.3.21-6.fc39.aa 19 MB/s | 3.6 MB 00:00 (315/591): opus-1.3.1-13.fc39.aarch64.rpm 18 MB/s | 205 kB 00:00 (316/591): orc-0.4.33-3.fc39.aarch64.rpm 23 MB/s | 202 kB 00:00 (317/591): pango-1.51.0-1.fc39.aarch64.rpm 47 MB/s | 339 kB 00:00 (318/591): pcre2-devel-10.42-1.fc39.2.aarch64.r 92 MB/s | 505 kB 00:00 (319/591): pcre-8.45-1.fc39.4.aarch64.rpm 23 MB/s | 184 kB 00:00 (320/591): pcre2-utf16-10.42-1.fc39.2.aarch64.r 37 MB/s | 199 kB 00:00 (321/591): pcre2-utf32-10.42-1.fc39.2.aarch64.r 29 MB/s | 187 kB 00:00 (322/591): perl-Carp-1.54-500.fc39.noarch.rpm 4.4 MB/s | 29 kB 00:00 (323/591): perl-Digest-1.20-500.fc39.noarch.rpm 14 MB/s | 25 kB 00:00 (324/591): perl-Data-Dumper-2.188-501.fc39.aarc 8.1 MB/s | 55 kB 00:00 (325/591): perl-Digest-MD5-2.58-500.fc39.aarch6 13 MB/s | 36 kB 00:00 (326/591): perl-Encode-3.19-500.fc39.aarch64.rp 218 MB/s | 1.7 MB 00:00 (327/591): perl-Error-0.17029-13.fc39.noarch.rp 6.9 MB/s | 40 kB 00:00 (328/591): perl-Exporter-5.77-500.fc39.noarch.r 12 MB/s | 31 kB 00:00 (329/591): perl-File-Path-2.18-500.fc39.noarch. 12 MB/s | 35 kB 00:00 (330/591): perl-File-Temp-0.231.100-500.fc39.no 23 MB/s | 58 kB 00:00 (331/591): perl-Getopt-Long-2.54-500.fc39.noarc 20 MB/s | 60 kB 00:00 (332/591): perl-IO-Socket-IP-0.42-1.fc39.noarch 13 MB/s | 42 kB 00:00 (333/591): perl-HTTP-Tiny-0.088-3.fc39.noarch.r 8.7 MB/s | 56 kB 00:00 (334/591): perl-IO-Socket-SSL-2.083-3.fc39.noar 33 MB/s | 225 kB 00:00 (335/591): perl-MIME-Base64-3.16-500.fc39.aarch 5.5 MB/s | 30 kB 00:00 (336/591): perl-Mozilla-CA-20230801-1.fc39.noar 3.9 MB/s | 13 kB 00:00 (337/591): perl-Net-SSLeay-1.92-10.fc39.aarch64 41 MB/s | 356 kB 00:00 (338/591): perl-PathTools-3.89-500.fc39.aarch64 7.5 MB/s | 88 kB 00:00 (339/591): perl-Pod-Escapes-1.07-500.fc39.noarc 1.4 MB/s | 20 kB 00:00 (340/591): perl-Pod-Perldoc-3.28.01-501.fc39.no 9.9 MB/s | 86 kB 00:00 (341/591): perl-Pod-Simple-3.45-4.fc39.noarch.r 14 MB/s | 218 kB 00:00 (342/591): perl-Pod-Usage-2.03-500.fc39.noarch. 2.5 MB/s | 39 kB 00:00 (343/591): perl-Socket-2.037-3.fc39.aarch64.rpm 17 MB/s | 56 kB 00:00 (344/591): perl-Scalar-List-Utils-1.63-500.fc39 14 MB/s | 71 kB 00:00 (345/591): perl-Storable-3.32-500.fc39.aarch64. 27 MB/s | 97 kB 00:00 (346/591): openblas-threads64-0.3.21-6.fc39.aar 11 MB/s | 3.7 MB 00:00 (347/591): perl-Term-ANSIColor-5.01-501.fc39.no 4.6 MB/s | 47 kB 00:00 (348/591): perl-Term-Cap-1.18-500.fc39.noarch.r 2.9 MB/s | 22 kB 00:00 (349/591): perl-TermReadKey-2.38-18.fc39.aarch6 9.4 MB/s | 35 kB 00:00 (350/591): perl-Text-Tabs+Wrap-2023.0511-3.fc39 5.8 MB/s | 22 kB 00:00 (351/591): perl-Text-ParseWords-3.31-500.fc39.n 2.9 MB/s | 16 kB 00:00 (352/591): perl-Time-Local-1.350-3.fc39.noarch. 12 MB/s | 34 kB 00:00 (353/591): perl-URI-5.21-1.fc39.noarch.rpm 27 MB/s | 125 kB 00:00 (354/591): perl-constant-1.33-501.fc39.noarch.r 4.5 MB/s | 22 kB 00:00 (355/591): perl-libnet-3.15-501.fc39.noarch.rpm 26 MB/s | 129 kB 00:00 (356/591): perl-parent-0.241-500.fc39.noarch.rp 3.1 MB/s | 14 kB 00:00 (357/591): perl-podlators-5.01-500.fc39.noarch. 16 MB/s | 125 kB 00:00 (358/591): pixman-0.42.2-2.fc39.aarch64.rpm 16 MB/s | 216 kB 00:00 (359/591): poppler-23.08.0-1.fc39.aarch64.rpm 87 MB/s | 1.1 MB 00:00 (360/591): poppler-data-0.4.11-5.fc39.noarch.rp 112 MB/s | 2.0 MB 00:00 (361/591): poppler-glib-23.08.0-1.fc39.aarch64. 13 MB/s | 178 kB 00:00 (362/591): proj-9.2.1-2.fc39.aarch64.rpm 42 MB/s | 1.3 MB 00:00 (363/591): protobuf-3.19.6-6.fc39.aarch64.rpm 40 MB/s | 923 kB 00:00 (364/591): proj-data-9.2.1-2.fc39.noarch.rpm 51 MB/s | 1.3 MB 00:00 (365/591): pybind11-devel-2.11.1-1.fc39.aarch64 45 MB/s | 176 kB 00:00 (366/591): pugixml-1.13-3.fc39.aarch64.rpm 14 MB/s | 96 kB 00:00 (367/591): python-pip-wheel-23.2.1-1.fc39.noarc 141 MB/s | 1.5 MB 00:00 (368/591): python-rpm-macros-3.12-4.fc39.noarch 2.1 MB/s | 19 kB 00:00 (369/591): python3-pybind11-2.11.1-1.fc39.aarch 30 MB/s | 198 kB 00:00 (370/591): python3-packaging-23.1-4.fc39.noarch 14 MB/s | 114 kB 00:00 (371/591): python3-rpm-generators-14-7.fc39.noa 6.0 MB/s | 30 kB 00:00 (372/591): python3-pyyaml-6.0.1-11.fc39.aarch64 27 MB/s | 223 kB 00:00 (373/591): python3-rpm-macros-3.12-4.fc39.noarc 2.2 MB/s | 14 kB 00:00 (374/591): python3-setuptools-67.7.2-7.fc39.noa 126 MB/s | 1.5 MB 00:00 (375/591): python3-six-1.16.0-12.fc39.noarch.rp 4.6 MB/s | 41 kB 00:00 (376/591): rdma-core-devel-46.0-4.fc39.aarch64. 55 MB/s | 429 kB 00:00 (377/591): python3-typing-extensions-4.8.0-1.fc 5.9 MB/s | 75 kB 00:00 (378/591): re2-20220601-3.fc39.aarch64.rpm 17 MB/s | 187 kB 00:00 (379/591): rhash-1.4.3-3.fc39.aarch64.rpm 9.3 MB/s | 192 kB 00:00 (380/591): python3-numpy-1.24.4-2.fc39.aarch64. 87 MB/s | 7.2 MB 00:00 (381/591): scotch-7.0.3-3.fc39.aarch64.rpm 36 MB/s | 276 kB 00:00 (382/591): scotch-devel-7.0.3-3.fc39.aarch64.rp 3.3 MB/s | 25 kB 00:00 (383/591): shared-mime-info-2.2-4.fc39.aarch64. 54 MB/s | 380 kB 00:00 (384/591): snappy-1.1.10-2.fc39.aarch64.rpm 11 MB/s | 37 kB 00:00 (385/591): rocksdb-devel-8.1.1-2.fc39.aarch64.r 2.9 MB/s | 292 kB 00:00 (386/591): snappy-devel-1.1.10-2.fc39.aarch64.r 379 kB/s | 22 kB 00:00 (387/591): soxr-0.1.3-14.fc39.aarch64.rpm 21 MB/s | 71 kB 00:00 (388/591): speex-1.2.0-15.fc39.aarch64.rpm 16 MB/s | 64 kB 00:00 (389/591): suitesparse-5.13.0-3.fc39.aarch64.rp 107 MB/s | 1.0 MB 00:00 (390/591): srt-libs-1.5.3-1.fc39.aarch64.rpm 30 MB/s | 350 kB 00:00 (391/591): tbb-2020.3-20.fc39.aarch64.rpm 29 MB/s | 140 kB 00:00 (392/591): svt-av1-libs-1.4.1-3.fc39.aarch64.rp 124 MB/s | 1.0 MB 00:00 (393/591): rocksdb-8.1.1-2.fc39.aarch64.rpm 20 MB/s | 2.7 MB 00:00 (394/591): tbb-devel-2020.3-20.fc39.aarch64.rpm 42 MB/s | 335 kB 00:00 (395/591): tcl-8.6.12-5.fc39.aarch64.rpm 163 MB/s | 1.1 MB 00:00 (396/591): twolame-libs-0.4.0-3.fc39.aarch64.rp 14 MB/s | 69 kB 00:00 (397/591): uriparser-0.9.7-3.fc39.aarch64.rpm 13 MB/s | 60 kB 00:00 (398/591): unixODBC-2.3.11-4.fc39.aarch64.rpm 55 MB/s | 470 kB 00:00 (399/591): urw-base35-bookman-fonts-20200910-18 88 MB/s | 847 kB 00:00 (400/591): urw-base35-c059-fonts-20200910-18.fc 68 MB/s | 874 kB 00:00 (401/591): urw-base35-fonts-20200910-18.fc39.no 854 kB/s | 10 kB 00:00 (402/591): urw-base35-fonts-common-20200910-18. 2.2 MB/s | 21 kB 00:00 (403/591): urw-base35-d050000l-fonts-20200910-1 3.3 MB/s | 76 kB 00:00 (404/591): urw-base35-gothic-fonts-20200910-18. 59 MB/s | 643 kB 00:00 (405/591): urw-base35-nimbus-roman-fonts-202009 51 MB/s | 856 kB 00:00 (406/591): urw-base35-nimbus-mono-ps-fonts-2020 37 MB/s | 795 kB 00:00 (407/591): urw-base35-nimbus-sans-fonts-2020091 89 MB/s | 1.3 MB 00:00 (408/591): urw-base35-standard-symbols-ps-fonts 17 MB/s | 42 kB 00:00 (409/591): urw-base35-z003-fonts-20200910-18.fc 50 MB/s | 276 kB 00:00 (410/591): urw-base35-p052-fonts-20200910-18.fc 93 MB/s | 974 kB 00:00 (411/591): utf8proc-2.7.0-5.fc39.aarch64.rpm 12 MB/s | 80 kB 00:00 (412/591): vapoursynth-libs-63-2.fc39.aarch64.r 63 MB/s | 323 kB 00:00 (413/591): vo-amrwbenc-0.1.3-19.fc39.aarch64.rp 17 MB/s | 77 kB 00:00 (414/591): xapian-core-libs-1.4.23-1.fc39.aarch 107 MB/s | 710 kB 00:00 (415/591): xcb-util-0.4.1-3.fc39.aarch64.rpm 2.8 MB/s | 19 kB 00:00 (416/591): xcb-util-image-0.4.1-3.fc39.aarch64. 2.7 MB/s | 19 kB 00:00 (417/591): xcb-util-keysyms-0.4.1-3.fc39.aarch6 2.0 MB/s | 14 kB 00:00 (418/591): xcb-util-renderutil-0.3.10-3.fc39.aa 3.4 MB/s | 17 kB 00:00 (419/591): xcb-util-wm-0.4.2-3.fc39.aarch64.rpm 5.5 MB/s | 31 kB 00:00 (420/591): xml-common-0.6.3-61.fc39.noarch.rpm 8.9 MB/s | 31 kB 00:00 (421/591): xorg-x11-proto-devel-2023.2-2.fc39.n 63 MB/s | 298 kB 00:00 (422/591): zeromq-4.3.4-8.fc39.aarch64.rpm 23 MB/s | 452 kB 00:00 (423/591): xvidcore-1.3.7-10.fc39.aarch64.rpm 9.7 MB/s | 227 kB 00:00 (424/591): zlib-devel-1.2.13-4.fc39.aarch64.rpm 14 MB/s | 45 kB 00:00 (425/591): zeromq-devel-4.3.4-8.fc39.aarch64.rp 281 kB/s | 16 kB 00:00 (426/591): alsa-lib-1.2.11-2.fc39.aarch64.rpm 46 MB/s | 510 kB 00:00 (427/591): annobin-docs-12.46-1.fc39.noarch.rpm 29 MB/s | 88 kB 00:00 (428/591): annobin-plugin-gcc-12.46-1.fc39.aarc 141 MB/s | 958 kB 00:00 (429/591): armadillo-12.8.1-1.fc39.aarch64.rpm 5.2 MB/s | 31 kB 00:00 (430/591): arpack-3.9.1-1.fc39.aarch64.rpm 35 MB/s | 178 kB 00:00 (431/591): blosc-1.21.5-2.fc39.aarch64.rpm 5.4 MB/s | 48 kB 00:00 (432/591): cpp-13.2.1-7.fc39.aarch64.rpm 257 MB/s | 9.7 MB 00:00 (433/591): zvbi-0.2.35-21.fc39.aarch64.rpm 3.0 MB/s | 418 kB 00:00 (434/591): crypto-policies-scripts-20231204-1.g 15 MB/s | 117 kB 00:00 (435/591): cups-libs-2.4.7-11.fc39.aarch64.rpm 29 MB/s | 268 kB 00:00 (436/591): emacs-filesystem-29.3-1.fc39.noarch. 1.7 MB/s | 7.2 kB 00:00 (437/591): dbus-broker-35-2.fc39.aarch64.rpm 19 MB/s | 172 kB 00:00 (438/591): fftw-3.3.10-10.fc39.aarch64.rpm 6.9 MB/s | 40 kB 00:00 (439/591): expat-2.6.2-1.fc39.aarch64.rpm 12 MB/s | 112 kB 00:00 (440/591): fftw-libs-3.3.10-10.fc39.aarch64.rpm 911 kB/s | 8.0 kB 00:00 (441/591): fftw-devel-3.3.10-10.fc39.aarch64.rp 8.8 MB/s | 133 kB 00:00 (442/591): fftw-libs-double-3.3.10-10.fc39.aarc 69 MB/s | 835 kB 00:00 (443/591): fftw-libs-long-3.3.10-10.fc39.aarch6 54 MB/s | 784 kB 00:00 (444/591): fftw-libs-single-3.3.10-10.fc39.aarc 73 MB/s | 881 kB 00:00 (445/591): flexiblas-3.4.2-1.fc39.aarch64.rpm 3.4 MB/s | 25 kB 00:00 (446/591): flexiblas-netlib-3.4.2-1.fc39.aarch6 64 MB/s | 2.6 MB 00:00 (447/591): flexiblas-netlib64-3.4.2-1.fc39.aarc 60 MB/s | 2.5 MB 00:00 (448/591): flexiblas-openblas-openmp-3.4.2-1.fc 2.8 MB/s | 17 kB 00:00 (449/591): flexiblas-openblas-openmp64-3.4.2-1. 3.4 MB/s | 17 kB 00:00 (450/591): fontconfig-2.14.2-6.fc39.aarch64.rpm 38 MB/s | 302 kB 00:00 (451/591): vtk-9.2.6-7.fc39.aarch64.rpm 62 MB/s | 22 MB 00:00 (452/591): gcc-plugin-annobin-13.2.1-7.fc39.aar 1.6 MB/s | 52 kB 00:00 (453/591): gcc-13.2.1-7.fc39.aarch64.rpm 179 MB/s | 31 MB 00:00 (454/591): geos-3.12.1-1.fc39.aarch64.rpm 66 MB/s | 1.0 MB 00:00 (455/591): gdal-libs-3.7.3-4.fc39.aarch64.rpm 74 MB/s | 8.0 MB 00:00 (456/591): giflib-5.2.2-1.fc39.aarch64.rpm 2.8 MB/s | 52 kB 00:00 (457/591): git-2.44.0-1.fc39.aarch64.rpm 20 MB/s | 53 kB 00:00 (458/591): git-core-2.44.0-1.fc39.aarch64.rpm 172 MB/s | 4.6 MB 00:00 (459/591): git-core-doc-2.44.0-1.fc39.noarch.rp 56 MB/s | 2.9 MB 00:00 (460/591): glib2-2.78.3-1.fc39.aarch64.rpm 41 MB/s | 2.8 MB 00:00 (461/591): gnutls-3.8.4-1.fc39.aarch64.rpm 20 MB/s | 1.1 MB 00:00 (462/591): google-noto-sans-vf-fonts-20240101-1 53 MB/s | 593 kB 00:00 (463/591): google-noto-fonts-common-20240101-1. 739 kB/s | 17 kB 00:00 (464/591): groff-base-1.23.0-3.fc39.aarch64.rpm 53 MB/s | 1.1 MB 00:00 (465/591): gstreamer1-1.22.9-1.fc39.aarch64.rpm 54 MB/s | 1.4 MB 00:00 (466/591): gcc-c++-13.2.1-7.fc39.aarch64.rpm 29 MB/s | 12 MB 00:00 (467/591): graphviz-8.1.0-6.fc39.aarch64.rpm 56 MB/s | 4.9 MB 00:00 (468/591): gstreamer1-plugins-base-1.22.9-1.fc3 51 MB/s | 2.1 MB 00:00 (469/591): highway-1.1.0-1.fc39.aarch64.rpm 8.9 MB/s | 97 kB 00:00 (470/591): imath-3.1.10-1.fc39.aarch64.rpm 7.8 MB/s | 93 kB 00:00 (471/591): keyutils-libs-devel-1.6.3-1.fc39.aar 3.9 MB/s | 60 kB 00:00 (472/591): kernel-headers-6.8.3-200.fc39.aarch6 47 MB/s | 1.6 MB 00:00 (473/591): krb5-devel-1.21.2-3.fc39.aarch64.rpm 9.2 MB/s | 144 kB 00:00 (474/591): libXpm-3.5.17-1.fc39.aarch64.rpm 5.3 MB/s | 64 kB 00:00 (475/591): libaec-1.1.2-1.fc39.aarch64.rpm 3.6 MB/s | 36 kB 00:00 (476/591): libaom-3.8.2-1.fc39.aarch64.rpm 35 MB/s | 1.5 MB 00:00 (477/591): libarrow-doc-13.0.0-4.fc39.noarch.rp 3.2 MB/s | 27 kB 00:00 (478/591): libasan-13.2.1-7.fc39.aarch64.rpm 45 MB/s | 455 kB 00:00 (479/591): libarrow-13.0.0-4.fc39.aarch64.rpm 61 MB/s | 4.3 MB 00:00 (480/591): libatomic-13.2.1-7.fc39.aarch64.rpm 2.9 MB/s | 42 kB 00:00 (481/591): libavformat-free-6.1.1-3.fc39.aarch6 25 MB/s | 1.1 MB 00:00 (482/591): libavutil-free-6.1.1-3.fc39.aarch64. 15 MB/s | 344 kB 00:00 (483/591): libdeflate-1.20-1.fc39.aarch64.rpm 6.8 MB/s | 62 kB 00:00 (484/591): libdrm-2.4.120-1.fc39.aarch64.rpm 5.7 MB/s | 131 kB 00:00 (485/591): libgfortran-13.2.1-7.fc39.aarch64.rp 30 MB/s | 438 kB 00:00 (486/591): libavcodec-free-6.1.1-3.fc39.aarch64 29 MB/s | 3.9 MB 00:00 (487/591): libgs-10.02.1-2.fc39.aarch64.rpm 102 MB/s | 3.4 MB 00:00 (488/591): libimagequant-4.0.3-2.fc39.aarch64.r 31 MB/s | 304 kB 00:00 (489/591): libkadm5-1.21.2-3.fc39.aarch64.rpm 8.0 MB/s | 78 kB 00:00 (490/591): libnauty-2.8.8-1.fc39.aarch64.rpm 72 MB/s | 707 kB 00:00 (491/591): libinput-1.25.0-4.fc39.aarch64.rpm 7.9 MB/s | 209 kB 00:00 (492/591): libnl3-3.9.0-1.fc39.aarch64.rpm 32 MB/s | 345 kB 00:00 (493/591): libopenmpt-0.6.12-1.fc39.aarch64.rpm 22 MB/s | 603 kB 00:00 (494/591): liborc1-1.9.3-1.fc39.aarch64.rpm 18 MB/s | 448 kB 00:00 (495/591): libproxy-0.5.3-3.fc39.aarch64.rpm 5.2 MB/s | 48 kB 00:00 (496/591): libsmbclient-4.19.5-1.fc39.aarch64.r 5.1 MB/s | 81 kB 00:00 (497/591): librsvg2-2.57.1-1.fc39.aarch64.rpm 45 MB/s | 1.5 MB 00:00 (498/591): libsodium-1.0.18-15.fc39.aarch64.rpm 2.0 MB/s | 121 kB 00:00 (499/591): java-17-openjdk-headless-17.0.9.0.9- 96 MB/s | 44 MB 00:00 (500/591): libstdc++-devel-13.2.1-7.fc39.aarch6 48 MB/s | 2.6 MB 00:00 (501/591): libswresample-free-6.1.1-3.fc39.aarc 10 MB/s | 63 kB 00:00 (502/591): libubsan-13.2.1-7.fc39.aarch64.rpm 47 MB/s | 209 kB 00:00 (503/591): libswscale-free-6.1.1-3.fc39.aarch64 23 MB/s | 166 kB 00:00 (504/591): liburing-2.5-1.fc39.aarch64.rpm 15 MB/s | 40 kB 00:00 (505/591): libusb1-1.0.27-1.fc39.aarch64.rpm 25 MB/s | 76 kB 00:00 (506/591): libuv-1.48.0-1.fc39.aarch64.rpm 42 MB/s | 249 kB 00:00 (507/591): libuv-devel-1.48.0-1.fc39.aarch64.rp 13 MB/s | 42 kB 00:00 (508/591): libuv-static-1.48.0-1.fc39.aarch64.r 25 MB/s | 107 kB 00:00 (509/591): libva-2.20.0-2.fc39.aarch64.rpm 13 MB/s | 108 kB 00:00 (510/591): libvpx-1.13.1-1.fc39.aarch64.rpm 126 MB/s | 1.1 MB 00:00 (511/591): libwacom-2.10.0-1.fc39.aarch64.rpm 8.0 MB/s | 43 kB 00:00 (512/591): libwacom-data-2.10.0-1.fc39.noarch.r 40 MB/s | 196 kB 00:00 (513/591): libwbclient-4.19.5-1.fc39.aarch64.rp 9.8 MB/s | 49 kB 00:00 (514/591): libxkbcommon-1.6.0-1.fc39.aarch64.rp 41 MB/s | 143 kB 00:00 (515/591): libxkbcommon-x11-1.6.0-1.fc39.aarch6 4.6 MB/s | 21 kB 00:00 (516/591): libzstd-devel-1.5.6-1.fc39.aarch64.r 15 MB/s | 52 kB 00:00 (517/591): libsodium-devel-1.0.18-15.fc39.aarch 6.6 MB/s | 1.1 MB 00:00 (518/591): lmdb-0.9.32-1.fc39.aarch64.rpm 1.4 MB/s | 33 kB 00:00 (519/591): lmdb-libs-0.9.32-1.fc39.aarch64.rpm 4.5 MB/s | 61 kB 00:00 (520/591): mariadb-connector-c-3.3.8-1.fc39.aar 16 MB/s | 214 kB 00:00 (521/591): mariadb-connector-c-config-3.3.8-1.f 566 kB/s | 8.6 kB 00:00 (522/591): lmdb-devel-0.9.32-1.fc39.aarch64.rpm 548 kB/s | 26 kB 00:00 (523/591): mbedtls-2.28.7-1.fc39.aarch64.rpm 26 MB/s | 401 kB 00:00 (524/591): mesa-filesystem-23.3.6-1.fc39.aarch6 1.2 MB/s | 19 kB 00:00 (525/591): mesa-libEGL-23.3.6-1.fc39.aarch64.rp 15 MB/s | 134 kB 00:00 (526/591): mesa-libGL-23.3.6-1.fc39.aarch64.rpm 19 MB/s | 188 kB 00:00 (527/591): mesa-libgbm-23.3.6-1.fc39.aarch64.rp 13 MB/s | 47 kB 00:00 (528/591): mesa-libglapi-23.3.6-1.fc39.aarch64. 15 MB/s | 67 kB 00:00 (529/591): minizip-ng-3.0.7-5.fc39.aarch64.rpm 11 MB/s | 70 kB 00:00 (530/591): ncurses-6.4-7.20230520.fc39.1.aarch6 42 MB/s | 414 kB 00:00 (531/591): llvm-libs-17.0.6-3.fc39.aarch64.rpm 192 MB/s | 26 MB 00:00 (532/591): nspr-4.35.0-18.fc39.aarch64.rpm 4.3 MB/s | 135 kB 00:00 (533/591): nss-3.98.0-1.fc39.aarch64.rpm 25 MB/s | 696 kB 00:00 (534/591): nss-sysinit-3.98.0-1.fc39.aarch64.rp 5.3 MB/s | 18 kB 00:00 (535/591): nss-softokn-freebl-3.98.0-1.fc39.aar 69 MB/s | 338 kB 00:00 (536/591): nss-softokn-3.98.0-1.fc39.aarch64.rp 53 MB/s | 415 kB 00:00 (537/591): nss-util-3.98.0-1.fc39.aarch64.rpm 9.3 MB/s | 86 kB 00:00 (538/591): openjpeg2-2.5.2-1.fc39.aarch64.rpm 16 MB/s | 176 kB 00:00 (539/591): openssh-9.3p1-10.fc39.aarch64.rpm 58 MB/s | 431 kB 00:00 (540/591): openssh-clients-9.3p1-10.fc39.aarch6 98 MB/s | 729 kB 00:00 (541/591): perl-AutoLoader-5.74-502.fc39.noarch 5.0 MB/s | 21 kB 00:00 (542/591): perl-Class-Struct-0.68-502.fc39.noar 12 MB/s | 22 kB 00:00 (543/591): perl-B-1.88-502.fc39.aarch64.rpm 55 MB/s | 178 kB 00:00 (544/591): perl-Errno-1.37-502.fc39.aarch64.rpm 7.4 MB/s | 15 kB 00:00 (545/591): perl-DynaLoader-1.54-502.fc39.aarch6 6.6 MB/s | 26 kB 00:00 (546/591): perl-Fcntl-1.15-502.fc39.aarch64.rpm 8.7 MB/s | 21 kB 00:00 (547/591): perl-File-Basename-2.86-502.fc39.noa 8.6 MB/s | 17 kB 00:00 (548/591): perl-File-Find-1.43-502.fc39.noarch. 6.0 MB/s | 25 kB 00:00 (549/591): perl-File-stat-1.13-502.fc39.noarch. 4.5 MB/s | 17 kB 00:00 (550/591): opencl-headers-3.0-19.20231212git236 2.5 MB/s | 89 kB 00:00 (551/591): perl-FileHandle-2.05-502.fc39.noarch 5.2 MB/s | 16 kB 00:00 (552/591): perl-Getopt-Std-1.13-502.fc39.noarch 5.1 MB/s | 16 kB 00:00 (553/591): perl-Git-2.44.0-1.fc39.noarch.rpm 14 MB/s | 40 kB 00:00 (554/591): perl-IPC-Open3-1.22-502.fc39.noarch. 7.8 MB/s | 22 kB 00:00 (555/591): perl-IO-1.52-502.fc39.aarch64.rpm 22 MB/s | 83 kB 00:00 (556/591): perl-POSIX-2.13-502.fc39.aarch64.rpm 34 MB/s | 98 kB 00:00 (557/591): perl-SelectSaver-1.02-502.fc39.noarc 2.4 MB/s | 12 kB 00:00 (558/591): perl-Symbol-1.09-502.fc39.noarch.rpm 2.9 MB/s | 14 kB 00:00 (559/591): perl-base-2.27-502.fc39.noarch.rpm 3.1 MB/s | 16 kB 00:00 (560/591): perl-if-0.61.000-502.fc39.noarch.rpm 4.8 MB/s | 14 kB 00:00 (561/591): perl-interpreter-5.38.2-502.fc39.aar 24 MB/s | 72 kB 00:00 (562/591): perl-lib-0.65-502.fc39.aarch64.rpm 5.7 MB/s | 15 kB 00:00 (563/591): perl-mro-1.28-502.fc39.aarch64.rpm 9.7 MB/s | 29 kB 00:00 (564/591): perl-locale-1.10-502.fc39.noarch.rpm 3.4 MB/s | 14 kB 00:00 (565/591): perl-libs-5.38.2-502.fc39.aarch64.rp 160 MB/s | 2.3 MB 00:00 (566/591): perl-overload-1.37-502.fc39.noarch.r 4.0 MB/s | 46 kB 00:00 (567/591): perl-overloading-0.02-502.fc39.noarc 1.2 MB/s | 13 kB 00:00 (568/591): perl-vars-1.05-502.fc39.noarch.rpm 2.6 MB/s | 13 kB 00:00 (569/591): procps-ng-4.0.3-5.fc39.aarch64.rpm 56 MB/s | 373 kB 00:00 (570/591): pyproject-rpm-macros-1.12.0-1.fc39.n 5.0 MB/s | 41 kB 00:00 (571/591): python3-devel-3.12.2-2.fc39.aarch64. 61 MB/s | 312 kB 00:00 (572/591): python3-3.12.2-2.fc39.aarch64.rpm 3.0 MB/s | 27 kB 00:00 (573/591): qt-settings-39.1-1.fc39.noarch.rpm 841 kB/s | 9.6 kB 00:00 (574/591): qt5-qtbase-common-5.15.12-5.fc39.noa 753 kB/s | 12 kB 00:00 (575/591): python3-libs-3.12.2-2.fc39.aarch64.r 182 MB/s | 9.1 MB 00:00 (576/591): qt5-qtbase-5.15.12-5.fc39.aarch64.rp 62 MB/s | 3.5 MB 00:00 (577/591): rav1e-libs-0.7.1-1.fc39.aarch64.rpm 54 MB/s | 798 kB 00:00 (578/591): rsvg-pixbuf-loader-2.57.1-1.fc39.aar 952 kB/s | 16 kB 00:00 (579/591): samba-common-4.19.5-1.fc39.noarch.rp 14 MB/s | 152 kB 00:00 (580/591): samba-common-libs-4.19.5-1.fc39.aarc 12 MB/s | 115 kB 00:00 (581/591): qt5-qtbase-gui-5.15.12-5.fc39.aarch6 60 MB/s | 6.3 MB 00:00 (582/591): systemd-254.10-1.fc39.aarch64.rpm 106 MB/s | 4.6 MB 00:00 (583/591): samba-client-libs-4.19.5-1.fc39.aarc 66 MB/s | 5.6 MB 00:00 (584/591): systemd-pam-254.10-1.fc39.aarch64.rp 18 MB/s | 352 kB 00:00 (585/591): tzdata-2024a-2.fc39.noarch.rpm 50 MB/s | 715 kB 00:00 (586/591): systemd-rpm-macros-254.10-1.fc39.noa 1.2 MB/s | 28 kB 00:00 (587/591): tzdata-java-2024a-2.fc39.noarch.rpm 21 MB/s | 208 kB 00:00 (588/591): vim-filesystem-9.1.264-1.fc39.noarch 3.1 MB/s | 17 kB 00:00 (589/591): xerces-c-3.2.5-1.fc39.aarch64.rpm 107 MB/s | 884 kB 00:00 (590/591): xkeyboard-config-2.40-1.fc39.noarch. 79 MB/s | 971 kB 00:00 (591/591): zimg-3.0.5-1.fc39.aarch64.rpm 2.1 MB/s | 139 kB 00:00 -------------------------------------------------------------------------------- Total 205 MB/s | 2.4 GB 00:12 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Running scriptlet: copy-jdk-configs-4.1-3.fc39.noarch 1/1 Running scriptlet: java-17-openjdk-headless-1:17.0.9.0.9-3.fc39.aarch64 1/1 Preparing : 1/1 Installing : cmake-filesystem-3.27.7-1.fc39.aarch64 1/591 Installing : libpng-2:1.6.37-15.fc39.aarch64 2/591 Installing : libgfortran-13.2.1-7.fc39.aarch64 3/591 Installing : expat-2.6.2-1.fc39.aarch64 4/591 Installing : libjpeg-turbo-2.1.4-3.fc39.aarch64 5/591 Installing : openblas-0.3.21-6.fc39.aarch64 6/591 Installing : javapackages-filesystem-6.1.0-10.fc39.noarch 7/591 Installing : cuda-toolkit-config-common-12.4.127-1.noarch 8/591 Installing : cuda-toolkit-12-config-common-12.4.127-1.noarch 9/591 Installing : cuda-toolkit-12-3-config-common-12.3.101-1.noarc 10/591 Installing : libdrm-2.4.120-1.fc39.aarch64 11/591 Installing : snappy-1.1.10-2.fc39.aarch64 12/591 Installing : openjpeg2-2.5.2-1.fc39.aarch64 13/591 Installing : nspr-4.35.0-18.fc39.aarch64 14/591 Installing : libwebp-1.3.2-2.fc39.aarch64 15/591 Installing : libX11-xcb-1.8.7-1.fc39.aarch64 16/591 Installing : libcublas-12-3-12.3.4.1-2.aarch64 17/591 Running scriptlet: libcublas-12-3-12.3.4.1-2.aarch64 17/591 Installing : libtalloc-2.4.1-1.fc39.aarch64 18/591 Installing : libogg-2:1.3.5-6.fc39.aarch64 19/591 Installing : libglvnd-1:1.7.0-1.fc39.aarch64 20/591 Installing : libglvnd-opengl-1:1.7.0-1.fc39.aarch64 21/591 Installing : fonts-filesystem-1:2.0.5-12.fc39.noarch 22/591 Installing : urw-base35-fonts-common-20200910-18.fc39.noarch 23/591 Installing : nss-util-3.98.0-1.fc39.aarch64 24/591 Installing : cuda-cudart-12-3-12.3.101-1.aarch64 25/591 Running scriptlet: cuda-cudart-12-3-12.3.101-1.aarch64 25/591 Installing : libuv-1:1.48.0-1.fc39.aarch64 26/591 Installing : libwayland-client-1.22.0-2.fc39.aarch64 27/591 Installing : libmpc-1.3.1-3.fc39.aarch64 28/591 Installing : gflags-2.2.2-12.fc39.aarch64 29/591 Installing : protobuf-compat-3.21.9-2.fc39.aarch64 30/591 Installing : libtheora-1:1.1.1-34.fc39.aarch64 31/591 Installing : libvorbis-1:1.3.7-8.fc39.aarch64 32/591 Installing : libtevent-0.15.0-1.fc39.aarch64 33/591 Installing : openblas-openmp-0.3.21-6.fc39.aarch64 34/591 Installing : lmdb-libs-0.9.32-1.fc39.aarch64 35/591 Installing : geos-3.12.1-1.fc39.aarch64 36/591 Installing : python-rpm-macros-3.12-4.fc39.noarch 37/591 Installing : lua-5.4.6-3.fc39.aarch64 38/591 Installing : libunwind-1.7.0-0.2.rc2.fc39.aarch64 39/591 Installing : libtool-ltdl-2.4.7-7.fc39.aarch64 40/591 Installing : libtdb-1.4.9-1.fc39.aarch64 41/591 Installing : libedit-3.1-48.20230828cvs.fc39.aarch64 42/591 Installing : libICE-1.0.10-11.fc39.aarch64 43/591 Installing : lcms2-2.15-2.fc39.aarch64 44/591 Installing : cuda-nvrtc-12-3-12.3.107-1.aarch64 45/591 Running scriptlet: cuda-nvrtc-12-3-12.3.107-1.aarch64 45/591 Installing : libcudnn8-8.9.7.29-2.cuda12.3.aarch64 46/591 Installing : pthreadpool-1:0.1-20240121.0.git178e3e06.fc39.aa 47/591 Installing : cpuinfo-1:0-20240327.0.gitf42f5eaf.fc39.aarch64 48/591 Installing : libSM-1.2.3-13.fc39.aarch64 49/591 Installing : unixODBC-2.3.11-4.fc39.aarch64 50/591 Installing : python3-rpm-macros-3.12-4.fc39.noarch 51/591 Installing : onnx-libs-1.17.0-20240404.0.git4128a090.fc39.aar 52/591 Installing : libcufft-12-3-11.0.12.1-2.aarch64 53/591 Running scriptlet: libcufft-12-3-11.0.12.1-2.aarch64 53/591 Installing : libcusparse-12-3-12.2.0.103-2.aarch64 54/591 Running scriptlet: libcusparse-12-3-12.2.0.103-2.aarch64 54/591 Installing : libcurand-12-3-10.3.4.107-1.aarch64 55/591 Running scriptlet: libcurand-12-3-10.3.4.107-1.aarch64 55/591 Installing : openblas-openmp64-0.3.21-6.fc39.aarch64 56/591 Installing : flexiblas-netlib64-3.4.2-1.fc39.aarch64 57/591 Installing : flexiblas-netlib-3.4.2-1.fc39.aarch64 58/591 Installing : flexiblas-openblas-openmp-3.4.2-1.fc39.aarch64 59/591 Installing : flexiblas-3.4.2-1.fc39.aarch64 60/591 Installing : flexiblas-openblas-openmp64-3.4.2-1.fc39.aarch64 61/591 Installing : suitesparse-5.13.0-3.fc39.aarch64 62/591 Installing : hdf-libs-4.2.15-13.fc39.aarch64 63/591 Installing : rav1e-libs-0.7.1-1.fc39.aarch64 64/591 Installing : minizip-ng-3.0.7-5.fc39.aarch64 65/591 Installing : freexl-2.0.0-2.fc39.aarch64 66/591 Installing : mesa-libglapi-23.3.6-1.fc39.aarch64 67/591 Installing : libsodium-1.0.18-15.fc39.aarch64 68/591 Installing : libnl3-3.9.0-1.fc39.aarch64 69/591 Installing : libibverbs-46.0-4.fc39.aarch64 70/591 Installing : libatomic-13.2.1-7.fc39.aarch64 71/591 Installing : libaec-1.1.2-1.fc39.aarch64 72/591 Installing : hdf5-1.12.1-12.fc39.aarch64 73/591 Installing : imath-3.1.10-1.fc39.aarch64 74/591 Installing : openexr-libs-3.1.10-2.fc39.aarch64 75/591 Installing : fftw-libs-single-3.3.10-10.fc39.aarch64 76/591 Installing : fftw-libs-long-3.3.10-10.fc39.aarch64 77/591 Installing : fftw-libs-double-3.3.10-10.fc39.aarch64 78/591 Installing : alsa-lib-1.2.11-2.fc39.aarch64 79/591 Installing : xorg-x11-proto-devel-2023.2-2.fc39.noarch 80/591 Running scriptlet: xml-common-0.6.3-61.fc39.noarch 81/591 Installing : xml-common-0.6.3-61.fc39.noarch 81/591 Installing : tbb-2020.3-20.fc39.aarch64 82/591 Installing : svt-av1-libs-1.4.1-3.fc39.aarch64 83/591 Installing : scotch-7.0.3-3.fc39.aarch64 84/591 Installing : protobuf-3.19.6-6.fc39.aarch64 85/591 Installing : pcre2-utf16-10.42-1.fc39.2.aarch64 86/591 Installing : opus-1.3.1-13.fc39.aarch64 87/591 Installing : openpgm-5.2.122-32.fc39.aarch64 88/591 Installing : zeromq-4.3.4-8.fc39.aarch64 89/591 Installing : ocl-icd-2.3.2-2.fc39.aarch64 90/591 Installing : nettle-3.9.1-2.fc39.aarch64 91/591 Installing : gnutls-3.8.4-1.fc39.aarch64 92/591 Installing : glib2-2.78.3-1.fc39.aarch64 93/591 Installing : libgudev-238-2.fc39.aarch64 94/591 Installing : shared-mime-info-2.2-4.fc39.aarch64 95/591 Running scriptlet: shared-mime-info-2.2-4.fc39.aarch64 95/591 Installing : gdk-pixbuf2-2.42.10-5.fc39.aarch64 96/591 Installing : lua-posix-36.2.1-3.fc39.aarch64 97/591 Installing : libxshmfence-1.3-13.fc39.aarch64 98/591 Installing : libwayland-server-1.22.0-2.fc39.aarch64 99/591 Installing : liblerc-4.0.0-4.fc39.aarch64 100/591 Installing : libicu-73.2-2.fc39.aarch64 101/591 Installing : libibumad-46.0-4.fc39.aarch64 102/591 Installing : libevdev-1.13.1-2.fc39.aarch64 103/591 Installing : libdav1d-1.2.1-2.fc39.aarch64 104/591 Installing : libXau-1.0.11-3.fc39.aarch64 105/591 Installing : libxcb-1.13.1-12.fc39.aarch64 106/591 Installing : mesa-libgbm-23.3.6-1.fc39.aarch64 107/591 Installing : libglvnd-egl-1:1.7.0-1.fc39.aarch64 108/591 Installing : mesa-libEGL-23.3.6-1.fc39.aarch64 109/591 Installing : jsoncpp-1.9.5-5.fc39.aarch64 110/591 Installing : hiredis-1.0.2-5.fc39.aarch64 111/591 Installing : flatbuffers-23.5.26-3.fc39.aarch64 112/591 Installing : double-conversion-3.1.5-9.fc39.aarch64 113/591 Installing : dbus-libs-1:1.14.10-1.fc39.aarch64 114/591 Installing : avahi-libs-0.8-24.fc39.aarch64 115/591 Installing : cups-libs-1:2.4.7-11.fc39.aarch64 116/591 Installing : adobe-mappings-cmap-20230622-1.fc39.noarch 117/591 Installing : libnccl-2.21.5-1+cuda12.4.aarch64 118/591 Running scriptlet: libnccl-2.21.5-1+cuda12.4.aarch64 118/591 Installing : gloo-1:0.5.0-20240302.0.git2565674c.cu12_3.fc39. 119/591 Installing : adobe-mappings-cmap-deprecated-20230622-1.fc39.n 120/591 Installing : libglvnd-gles-1:1.7.0-1.fc39.aarch64 121/591 Installing : xcb-util-0.4.1-3.fc39.aarch64 122/591 Installing : xcb-util-image-0.4.1-3.fc39.aarch64 123/591 Installing : xcb-util-keysyms-0.4.1-3.fc39.aarch64 124/591 Installing : xcb-util-renderutil-0.3.10-3.fc39.aarch64 125/591 Installing : xcb-util-wm-0.4.2-3.fc39.aarch64 126/591 Installing : libXau-devel-1.0.11-3.fc39.aarch64 127/591 Installing : libxcb-devel-1.13.1-12.fc39.aarch64 128/591 Installing : copy-jdk-configs-4.1-3.fc39.noarch 129/591 Installing : graphene-1.10.6-6.fc39.aarch64 130/591 Installing : srt-libs-1.5.3-1.fc39.aarch64 131/591 Installing : openpgm-devel-5.2.122-32.fc39.aarch64 132/591 Installing : liborc1-1.9.3-1.fc39.aarch64 133/591 Installing : scotch-devel-7.0.3-3.fc39.aarch64 134/591 Installing : iso-codes-4.15.0-2.fc39.noarch 135/591 Installing : fftw-3.3.10-10.fc39.aarch64 136/591 Installing : fftw-libs-3.3.10-10.fc39.aarch64 137/591 Installing : cgnslib-libs-4.4.0-2.fc39.aarch64 138/591 Installing : librdmacm-46.0-4.fc39.aarch64 139/591 Installing : libsodium-devel-1.0.18-15.fc39.aarch64 140/591 Installing : glpk-5.0-7.fc39.aarch64 141/591 Installing : coin-or-CoinUtils-2.11.4-10.fc39.aarch64 142/591 Installing : coin-or-Osi-0.108.6-9.fc39.aarch64 143/591 Installing : arpack-3.9.1-1.fc39.aarch64 144/591 Installing : magma-2.8.0-20240328.0.cu12_3.fc39.aarch64 145/591 Installing : pyproject-rpm-macros-1.12.0-1.fc39.noarch 146/591 Installing : nnpack-0-20230201.0.git70a77f48.fc38.aarch64 147/591 Installing : qnnpack-0-20190828.2.git7d2a4e99.fc38.aarch64 148/591 Installing : llvm16-libs-16.0.6-5.fc39.aarch64 149/591 Installing : llvm-libs-17.0.6-3.fc39.aarch64 150/591 Installing : halide-17.0.1-20240220.0.fc39.aarch64 151/591 Installing : libldb-2.8.0-1.fc39.aarch64 152/591 Installing : libunwind-devel-1.7.0-0.2.rc2.fc39.aarch64 153/591 Installing : lua-term-0.07-18.fc39.aarch64 154/591 Installing : librttopo-1.1.0-12.fc39.aarch64 155/591 Installing : lmdb-0.9.32-1.fc39.aarch64 156/591 Installing : protobuf-compat-compiler-3.21.9-2.fc39.aarch64 157/591 Installing : gflags-devel-2.2.2-12.fc39.aarch64 158/591 Installing : glog-0.3.5-18.fc39.aarch64 159/591 Installing : ceres-solver-2.1.0-6.fc39.aarch64 160/591 Installing : cuda-gcc-12-12.3.1-1.fc39.aarch64 161/591 Installing : cpp-13.2.1-7.fc39.aarch64 162/591 Installing : libwayland-cursor-1.22.0-2.fc39.aarch64 163/591 Installing : tensorpipe-0-20220513.1.gitbb1473a4.fc37.aarch64 164/591 Installing : libuv-static-1:1.48.0-1.fc39.aarch64 165/591 Installing : libuv-devel-1:1.48.0-1.fc39.aarch64 166/591 Installing : nss-softokn-freebl-3.98.0-1.fc39.aarch64 167/591 Installing : nss-softokn-3.98.0-1.fc39.aarch64 168/591 Installing : urw-base35-bookman-fonts-20200910-18.fc39.noarch 169/591 Running scriptlet: urw-base35-bookman-fonts-20200910-18.fc39.noarch 169/591 Installing : urw-base35-c059-fonts-20200910-18.fc39.noarch 170/591 Running scriptlet: urw-base35-c059-fonts-20200910-18.fc39.noarch 170/591 Installing : urw-base35-d050000l-fonts-20200910-18.fc39.noarc 171/591 Running scriptlet: urw-base35-d050000l-fonts-20200910-18.fc39.noarc 171/591 Installing : urw-base35-gothic-fonts-20200910-18.fc39.noarch 172/591 Running scriptlet: urw-base35-gothic-fonts-20200910-18.fc39.noarch 172/591 Installing : urw-base35-nimbus-mono-ps-fonts-20200910-18.fc39 173/591 Running scriptlet: urw-base35-nimbus-mono-ps-fonts-20200910-18.fc39 173/591 Installing : urw-base35-nimbus-roman-fonts-20200910-18.fc39.n 174/591 Running scriptlet: urw-base35-nimbus-roman-fonts-20200910-18.fc39.n 174/591 Installing : urw-base35-nimbus-sans-fonts-20200910-18.fc39.no 175/591 Running scriptlet: urw-base35-nimbus-sans-fonts-20200910-18.fc39.no 175/591 Installing : urw-base35-p052-fonts-20200910-18.fc39.noarch 176/591 Running scriptlet: urw-base35-p052-fonts-20200910-18.fc39.noarch 176/591 Installing : urw-base35-standard-symbols-ps-fonts-20200910-18 177/591 Running scriptlet: urw-base35-standard-symbols-ps-fonts-20200910-18 177/591 Installing : urw-base35-z003-fonts-20200910-18.fc39.noarch 178/591 Running scriptlet: urw-base35-z003-fonts-20200910-18.fc39.noarch 178/591 Installing : urw-base35-fonts-20200910-18.fc39.noarch 179/591 Installing : abattis-cantarell-vf-fonts-0.301-10.fc39.noarch 180/591 Installing : mesa-libGLU-9.0.3-1.fc39.aarch64 181/591 Installing : leveldb-1.23-7.fc39.aarch64 182/591 Installing : blosc-1.21.5-2.fc39.aarch64 183/591 Installing : netcdf-4.9.0-5.fc38.aarch64 184/591 Installing : libcusolver-12-3-11.5.4.101-2.aarch64 185/591 Running scriptlet: libcusolver-12-3-11.5.4.101-2.aarch64 185/591 Installing : libnpp-12-3-12.2.3.2-2.aarch64 186/591 Running scriptlet: libnpp-12-3-12.2.3.2-2.aarch64 186/591 Installing : libnvjitlink-12-3-12.3.101-1.aarch64 187/591 Running scriptlet: libnvjitlink-12-3-12.3.101-1.aarch64 187/591 Installing : openblas-openmp64_-0.3.21-6.fc39.aarch64 188/591 Installing : openblas-serial-0.3.21-6.fc39.aarch64 189/591 Installing : openblas-serial64-0.3.21-6.fc39.aarch64 190/591 Installing : openblas-serial64_-0.3.21-6.fc39.aarch64 191/591 Installing : openblas-threads-0.3.21-6.fc39.aarch64 192/591 Installing : openblas-threads64-0.3.21-6.fc39.aarch64 193/591 Installing : openblas-threads64_-0.3.21-6.fc39.aarch64 194/591 Installing : ogdi-4.1.0-11.fc39.aarch64 195/591 Installing : libharu-2.4.3-3.fc39.aarch64 196/591 Installing : zvbi-0.2.35-21.fc39.aarch64 197/591 Running scriptlet: zvbi-0.2.35-21.fc39.aarch64 197/591 Installing : uriparser-0.9.7-3.fc39.aarch64 198/591 Installing : libkml-1.3.0-45.fc39.aarch64 199/591 Installing : zimg-3.0.5-1.fc39.aarch64 200/591 Installing : xkeyboard-config-2.40-1.fc39.noarch 201/591 Installing : libxkbcommon-1.6.0-1.fc39.aarch64 202/591 Installing : libxkbcommon-x11-1.6.0-1.fc39.aarch64 203/591 Installing : xerces-c-3.2.5-1.fc39.aarch64 204/591 Installing : vim-filesystem-2:9.1.264-1.fc39.noarch 205/591 Installing : tzdata-java-2024a-2.fc39.noarch 206/591 Installing : tzdata-2024a-2.fc39.noarch 207/591 Installing : qt-settings-39.1-1.fc39.noarch 208/591 Installing : procps-ng-4.0.3-5.fc39.aarch64 209/591 Installing : openssh-9.3p1-10.fc39.aarch64 210/591 Installing : opencl-headers-3.0-19.20231212git2368105.fc39.no 211/591 Installing : ncurses-6.4-7.20230520.fc39.1.aarch64 212/591 Installing : mesa-filesystem-23.3.6-1.fc39.aarch64 213/591 Installing : mbedtls-2.28.7-1.fc39.aarch64 214/591 Installing : mariadb-connector-c-config-3.3.8-1.fc39.noarch 215/591 Installing : mariadb-connector-c-3.3.8-1.fc39.aarch64 216/591 Installing : libwacom-data-2.10.0-1.fc39.noarch 217/591 Installing : libvpx-1.13.1-1.fc39.aarch64 218/591 Installing : libusb1-1.0.27-1.fc39.aarch64 219/591 Installing : liburing-2.5-1.fc39.aarch64 220/591 Installing : rocksdb-8.1.1-2.fc39.aarch64 221/591 Installing : libubsan-13.2.1-7.fc39.aarch64 222/591 Installing : libstdc++-devel-13.2.1-7.fc39.aarch64 223/591 Installing : libkadm5-1.21.2-3.fc39.aarch64 224/591 Installing : libimagequant-4.0.3-2.fc39.aarch64 225/591 Installing : libdeflate-1.20-1.fc39.aarch64 226/591 Installing : libasan-13.2.1-7.fc39.aarch64 227/591 Installing : libarrow-doc-13.0.0-4.fc39.noarch 228/591 Installing : keyutils-libs-devel-1.6.3-1.fc39.aarch64 229/591 Installing : kernel-headers-6.8.3-200.fc39.aarch64 230/591 Installing : libxcrypt-devel-4.4.36-2.fc39.aarch64 231/591 Installing : glibc-devel-2.38-99.fc39.aarch64 232/591 Installing : highway-1.1.0-1.fc39.aarch64 233/591 Installing : libjxl-1:0.8.2-3.fc39.aarch64 234/591 Installing : libaom-3.8.2-1.fc39.aarch64 235/591 Installing : libavif-0.11.1-11.fc39.aarch64 236/591 Running scriptlet: groff-base-1.23.0-3.fc39.aarch64 237/591 Installing : groff-base-1.23.0-3.fc39.aarch64 237/591 Running scriptlet: groff-base-1.23.0-3.fc39.aarch64 237/591 Installing : perl-Digest-1.20-500.fc39.noarch 238/591 Installing : perl-Digest-MD5-2.58-500.fc39.aarch64 239/591 Installing : perl-B-1.88-502.fc39.aarch64 240/591 Installing : perl-FileHandle-2.05-502.fc39.noarch 241/591 Installing : perl-Data-Dumper-2.188-501.fc39.aarch64 242/591 Installing : perl-libnet-3.15-501.fc39.noarch 243/591 Installing : perl-AutoLoader-5.74-502.fc39.noarch 244/591 Installing : perl-base-2.27-502.fc39.noarch 245/591 Installing : perl-URI-5.21-1.fc39.noarch 246/591 Installing : perl-Pod-Escapes-1:1.07-500.fc39.noarch 247/591 Installing : perl-Text-Tabs+Wrap-2023.0511-3.fc39.noarch 248/591 Installing : perl-Time-Local-2:1.350-3.fc39.noarch 249/591 Installing : perl-Net-SSLeay-1.92-10.fc39.aarch64 250/591 Installing : perl-Mozilla-CA-20230801-1.fc39.noarch 251/591 Installing : perl-File-Path-2.18-500.fc39.noarch 252/591 Installing : perl-if-0.61.000-502.fc39.noarch 253/591 Installing : perl-locale-1.10-502.fc39.noarch 254/591 Installing : perl-IO-Socket-IP-0.42-1.fc39.noarch 255/591 Installing : perl-IO-Socket-SSL-2.083-3.fc39.noarch 256/591 Installing : perl-Term-ANSIColor-5.01-501.fc39.noarch 257/591 Installing : perl-Term-Cap-1.18-500.fc39.noarch 258/591 Installing : perl-Class-Struct-0.68-502.fc39.noarch 259/591 Installing : perl-POSIX-2.13-502.fc39.aarch64 260/591 Installing : perl-File-Temp-1:0.231.100-500.fc39.noarch 261/591 Installing : perl-HTTP-Tiny-0.088-3.fc39.noarch 262/591 Installing : perl-Pod-Simple-1:3.45-4.fc39.noarch 263/591 Installing : perl-IPC-Open3-1.22-502.fc39.noarch 264/591 Installing : perl-Socket-4:2.037-3.fc39.aarch64 265/591 Installing : perl-SelectSaver-1.02-502.fc39.noarch 266/591 Installing : perl-Symbol-1.09-502.fc39.noarch 267/591 Installing : perl-podlators-1:5.01-500.fc39.noarch 268/591 Installing : perl-Pod-Perldoc-3.28.01-501.fc39.noarch 269/591 Installing : perl-File-stat-1.13-502.fc39.noarch 270/591 Installing : perl-Text-ParseWords-3.31-500.fc39.noarch 271/591 Installing : perl-Fcntl-1.15-502.fc39.aarch64 272/591 Installing : perl-mro-1.28-502.fc39.aarch64 273/591 Installing : perl-Pod-Usage-4:2.03-500.fc39.noarch 274/591 Installing : perl-IO-1.52-502.fc39.aarch64 275/591 Installing : perl-overloading-0.02-502.fc39.noarch 276/591 Installing : perl-MIME-Base64-3.16-500.fc39.aarch64 277/591 Installing : perl-Scalar-List-Utils-5:1.63-500.fc39.aarch64 278/591 Installing : perl-constant-1.33-501.fc39.noarch 279/591 Installing : perl-parent-1:0.241-500.fc39.noarch 280/591 Installing : perl-Errno-1.37-502.fc39.aarch64 281/591 Installing : perl-File-Basename-2.86-502.fc39.noarch 282/591 Installing : perl-Getopt-Std-1.13-502.fc39.noarch 283/591 Installing : perl-Storable-1:3.32-500.fc39.aarch64 284/591 Installing : perl-Getopt-Long-1:2.54-500.fc39.noarch 285/591 Installing : perl-overload-1.37-502.fc39.noarch 286/591 Installing : perl-vars-1.05-502.fc39.noarch 287/591 Installing : perl-Exporter-5.77-500.fc39.noarch 288/591 Installing : perl-PathTools-3.89-500.fc39.aarch64 289/591 Installing : perl-Encode-4:3.19-500.fc39.aarch64 290/591 Installing : perl-DynaLoader-1.54-502.fc39.aarch64 291/591 Installing : perl-Carp-1.54-500.fc39.noarch 292/591 Installing : perl-libs-4:5.38.2-502.fc39.aarch64 293/591 Installing : perl-interpreter-4:5.38.2-502.fc39.aarch64 294/591 Installing : infiniband-diags-46.0-4.fc39.aarch64 295/591 Installing : perl-Error-1:0.17029-13.fc39.noarch 296/591 Installing : perl-TermReadKey-2.38-18.fc39.aarch64 297/591 Installing : perl-File-Find-1.43-502.fc39.noarch 298/591 Installing : perl-lib-0.65-502.fc39.aarch64 299/591 Installing : google-noto-fonts-common-20240101-1.fc39.noarch 300/591 Installing : google-noto-sans-vf-fonts-20240101-1.fc39.noarch 301/591 Installing : default-fonts-core-sans-4.0-9.fc39.noarch 302/591 Installing : google-droid-sans-fonts-20200215-17.fc39.noarch 303/591 Installing : giflib-5.2.2-1.fc39.aarch64 304/591 Installing : emacs-filesystem-1:29.3-1.fc39.noarch 305/591 Installing : annobin-docs-12.46-1.fc39.noarch 306/591 Installing : zlib-devel-1.2.13-4.fc39.aarch64 307/591 Installing : xvidcore-1.3.7-10.fc39.aarch64 308/591 Installing : xapian-core-libs-1.4.23-1.fc39.aarch64 309/591 Installing : vo-amrwbenc-0.1.3-19.fc39.aarch64 310/591 Installing : utf8proc-2.7.0-5.fc39.aarch64 311/591 Installing : twolame-libs-0.4.0-3.fc39.aarch64 312/591 Installing : tcl-1:8.6.12-5.fc39.aarch64 313/591 Installing : speex-1.2.0-15.fc39.aarch64 314/591 Installing : soxr-0.1.3-14.fc39.aarch64 315/591 Installing : rhash-1.4.3-3.fc39.aarch64 316/591 Installing : re2-1:20220601-3.fc39.aarch64 317/591 Installing : libarrow-13.0.0-4.fc39.aarch64 318/591 Installing : python-pip-wheel-23.2.1-1.fc39.noarch 319/591 Installing : pugixml-1.13-3.fc39.aarch64 320/591 Installing : proj-data-9.2.1-2.fc39.noarch 321/591 Installing : poppler-data-0.4.11-5.fc39.noarch 322/591 Installing : pixman-0.42.2-2.fc39.aarch64 323/591 Installing : pcre2-utf32-10.42-1.fc39.2.aarch64 324/591 Installing : pcre2-devel-10.42-1.fc39.2.aarch64 325/591 Installing : pcre-8.45-1.fc39.4.aarch64 326/591 Installing : gklib-5.1.1-20230326.0.git8bd6bad7.fc39.aarch64 327/591 Installing : metis-5.2.1-20230403.0.gite0f1b88b.fc39.aarch64 328/591 Installing : SuperLU-6.0.0-1.fc39.aarch64 329/591 Installing : armadillo-12.8.1-1.fc39.aarch64 330/591 Installing : orc-0.4.33-3.fc39.aarch64 331/591 Installing : opencore-amr-0.1.6-4.fc39.aarch64 332/591 Installing : numactl-libs-2.0.16-3.fc39.aarch64 333/591 Installing : netpbm-11.02.00-2.fc39.aarch64 334/591 Installing : gts-0.7.6-46.20121130.fc39.aarch64 335/591 Installing : mtdev-1.1.6-6.fc39.aarch64 336/591 Installing : mpg123-libs-1.31.3-2.fc39.aarch64 337/591 Installing : libopenmpt-0.6.12-1.fc39.aarch64 338/591 Installing : mpdecimal-2.5.1-7.fc39.aarch64 339/591 Installing : miniz-3.0.2-3.fc39.aarch64 340/591 Installing : lua-lpeg-1.0.2-11.fc39.aarch64 341/591 Installing : lua-json-1.3.4-4.fc39.noarch 342/591 Installing : lua-filesystem-1.8.0-9.fc39.aarch64 343/591 Installing : Lmod-8.7.32-1.fc39.aarch64 344/591 Running scriptlet: Lmod-8.7.32-1.fc39.aarch64 344/591 Installing : lpcnetfreedv-0.5-3.fc39.aarch64 345/591 Installing : codec2-1.2.0-2.fc39.aarch64 346/591 Installing : lksctp-tools-1.0.19-4.fc39.aarch64 347/591 Installing : libyaml-0.2.5-12.fc39.aarch64 348/591 Installing : libwayland-egl-1.22.0-2.fc39.aarch64 349/591 Installing : libvisual-1:0.4.1-2.fc39.aarch64 350/591 Installing : libverto-devel-0.3.2-6.fc39.aarch64 351/591 Installing : libudfread-1.1.2-6.fc39.aarch64 352/591 Installing : libsepol-devel-3.5-2.fc39.aarch64 353/591 Installing : libselinux-devel-3.5-5.fc39.aarch64 354/591 Installing : libseccomp-2.5.3-6.fc39.aarch64 355/591 Installing : libraw1394-2.1.2-18.fc39.aarch64 356/591 Installing : libdc1394-2.2.7-3.fc39.aarch64 357/591 Installing : librabbitmq-0.13.0-3.fc39.aarch64 358/591 Installing : libqhull_r-1:7.2.1-13.fc39.aarch64 359/591 Installing : libpq-15.3-1.fc39.aarch64 360/591 Installing : libpaper-1:2.1.1-1.fc39.aarch64 361/591 Installing : libmodplug-1:0.8.9.0-17.fc39.aarch64 362/591 Installing : libijs-0.35-19.fc39.aarch64 363/591 Installing : libgta-1.2.1-10.fc39.aarch64 364/591 Installing : libgpg-error-1.47-2.fc39.aarch64 365/591 Installing : libgcrypt-1.10.2-2.fc39.aarch64 366/591 Installing : libglvnd-core-devel-1:1.7.0-1.fc39.aarch64 367/591 Installing : libdatrie-0.2.13-7.fc39.aarch64 368/591 Installing : libthai-0.1.29-6.fc39.aarch64 369/591 Installing : libcom_err-devel-1.47.0-2.fc39.aarch64 370/591 Installing : krb5-devel-1.21.2-3.fc39.aarch64 371/591 Installing : libcbor-0.10.2-2.fc39.aarch64 372/591 Installing : libfido2-1.13.0-3.fc39.aarch64 373/591 Installing : openssh-clients-9.3p1-10.fc39.aarch64 374/591 Running scriptlet: openssh-clients-9.3p1-10.fc39.aarch64 374/591 Installing : libb2-0.98.1-9.fc39.aarch64 375/591 Installing : python3-3.12.2-2.fc39.aarch64 376/591 Installing : python3-libs-3.12.2-2.fc39.aarch64 377/591 Installing : gstreamer1-1.22.9-1.fc39.aarch64 378/591 Installing : cmake-rpm-macros-3.27.7-1.fc39.noarch 379/591 Installing : vapoursynth-libs-63-2.fc39.aarch64 380/591 Installing : onnx-optimizer-0.3.19-20240303.0.gitb3a46118.fc3 381/591 Installing : python3-packaging-23.1-4.fc39.noarch 382/591 Installing : python3-rpm-generators-14-7.fc39.noarch 383/591 Installing : python3-six-1.16.0-12.fc39.noarch 384/591 Installing : crypto-policies-scripts-20231204-1.git1e3a2e4.fc 385/591 Installing : nss-sysinit-3.98.0-1.fc39.aarch64 386/591 Installing : nss-3.98.0-1.fc39.aarch64 387/591 Running scriptlet: nss-3.98.0-1.fc39.aarch64 387/591 Installing : java-17-openjdk-headless-1:17.0.9.0.9-3.fc39.aar 388/591 Running scriptlet: java-17-openjdk-headless-1:17.0.9.0.9-3.fc39.aar 388/591 Installing : byte-buddy-agent-1.14.2-2.fc39.noarch 389/591 Installing : javapackages-tools-6.1.0-10.fc39.noarch 390/591 Installing : objectweb-asm-9.5-2.fc39.noarch 391/591 Installing : byte-buddy-1.14.2-2.fc39.noarch 392/591 Installing : objenesis-3.3-3.fc39.noarch 393/591 Installing : opentest4j-1.2.0-14.fc39.noarch 394/591 Installing : mockito-3.12.4-7.fc39.noarch 395/591 Installing : jacop-4.9.0-2.fc39.noarch 396/591 Installing : libwacom-2.10.0-1.fc39.aarch64 397/591 Installing : libinput-1.25.0-4.fc39.aarch64 398/591 Running scriptlet: libinput-1.25.0-4.fc39.aarch64 398/591 Installing : libX11-common-1.8.7-1.fc39.noarch 399/591 Installing : libX11-1.8.7-1.fc39.aarch64 400/591 Installing : libXext-1.3.5-3.fc39.aarch64 401/591 Installing : libXrender-0.9.11-3.fc39.aarch64 402/591 Installing : libXfixes-6.0.0-6.fc39.aarch64 403/591 Installing : libXcursor-1.2.1-4.fc39.aarch64 404/591 Installing : libXi-1.8.1-2.fc39.aarch64 405/591 Installing : libXv-1.0.11-19.fc39.aarch64 406/591 Installing : libXxf86vm-1.1.5-3.fc39.aarch64 407/591 Installing : libglvnd-glx-1:1.7.0-1.fc39.aarch64 408/591 Installing : mesa-libGL-23.3.6-1.fc39.aarch64 409/591 Installing : libva-2.20.0-2.fc39.aarch64 410/591 Installing : glx-utils-9.0.0-3.fc39.aarch64 411/591 Installing : libGLEW-2.2.0-5.fc39.aarch64 412/591 Installing : libvdpau-1.5-4.fc39.aarch64 413/591 Installing : libavutil-free-6.1.1-3.fc39.aarch64 414/591 Installing : libswresample-free-6.1.1-3.fc39.aarch64 415/591 Installing : libswscale-free-6.1.1-3.fc39.aarch64 416/591 Installing : libX11-devel-1.8.7-1.fc39.aarch64 417/591 Installing : libglvnd-devel-1:1.7.0-1.fc39.aarch64 418/591 Installing : libXt-1.2.1-5.fc39.aarch64 419/591 Installing : libXpm-3.5.17-1.fc39.aarch64 420/591 Installing : less-633-2.fc39.aarch64 421/591 Installing : git-core-2.44.0-1.fc39.aarch64 422/591 Installing : git-core-doc-2.44.0-1.fc39.noarch 423/591 Installing : perl-Git-2.44.0-1.fc39.noarch 424/591 Installing : git-2.44.0-1.fc39.aarch64 425/591 Installing : lame-libs-3.100-15.fc39.aarch64 426/591 Installing : kmod-libs-30-6.fc39.aarch64 427/591 Installing : json-c-0.17-1.fc39.aarch64 428/591 Installing : jbigkit-libs-2.1-26.fc39.aarch64 429/591 Installing : libtiff-4.4.0-8.fc39.aarch64 430/591 Installing : proj-9.2.1-2.fc39.aarch64 431/591 Installing : libgeotiff-1.7.1-9.fc39.aarch64 432/591 Installing : libspatialite-5.0.1-23.fc39.aarch64 433/591 Installing : jbig2dec-libs-0.19-10.fc39.aarch64 434/591 Installing : isl-0.16.1-18.fc39.aarch64 435/591 Installing : ilbc-3.0.4-7.fc39.aarch64 436/591 Installing : gsm-1.0.22-3.fc39.aarch64 437/591 Installing : gsl-2.7.1-5.fc39.aarch64 438/591 Installing : graphite2-1.3.14-12.fc39.aarch64 439/591 Installing : cairo-1.18.0-1.fc39.aarch64 440/591 Installing : harfbuzz-8.2.1-2.fc39.aarch64 441/591 Installing : freetype-2.13.1-2.fc39.aarch64 442/591 Installing : fontconfig-2.14.2-6.fc39.aarch64 443/591 Running scriptlet: fontconfig-2.14.2-6.fc39.aarch64 443/591 Installing : poppler-23.08.0-1.fc39.aarch64 444/591 Installing : cairo-gobject-1.18.0-1.fc39.aarch64 445/591 Installing : poppler-glib-23.08.0-1.fc39.aarch64 446/591 Installing : gd-2.3.3-12.fc39.aarch64 447/591 Installing : libXft-2.3.8-3.fc39.aarch64 448/591 Installing : libbluray-1.3.4-3.fc39.aarch64 449/591 Installing : gmp-c++-1:6.2.1-5.fc39.aarch64 450/591 Installing : gmp-devel-1:6.2.1-5.fc39.aarch64 451/591 Installing : gl-manpages-1.1-28.20190306.fc39.noarch 452/591 Installing : gc-8.2.2-4.fc39.aarch64 453/591 Installing : guile22-2.2.7-9.fc39.aarch64 454/591 Installing : make-1:4.4.1-2.fc39.aarch64 455/591 Installing : gcc-13.2.1-7.fc39.aarch64 456/591 Running scriptlet: gcc-13.2.1-7.fc39.aarch64 456/591 Installing : cmake-data-3.27.7-1.fc39.noarch 457/591 Installing : cmake-3.27.7-1.fc39.aarch64 458/591 Installing : pybind11-devel-2.11.1-1.fc39.aarch64 459/591 Installing : gcc-c++-13.2.1-7.fc39.aarch64 460/591 Installing : game-music-emu-0.6.3-12.fc39.aarch64 461/591 Installing : fribidi-1.0.13-2.fc39.aarch64 462/591 Installing : pango-1.51.0-1.fc39.aarch64 463/591 Installing : librsvg2-2.57.1-1.fc39.aarch64 464/591 Installing : rsvg-pixbuf-loader-2.57.1-1.fc39.aarch64 465/591 Installing : gdk-pixbuf2-modules-2.42.10-5.fc39.aarch64 466/591 Installing : openslide-3.4.1-24.fc39.aarch64 467/591 Installing : lasi-1.1.3-11.fc39.aarch64 468/591 Installing : fdk-aac-free-2.0.0-11.fc39.aarch64 469/591 Installing : libavcodec-free-6.1.1-3.fc39.aarch64 470/591 Installing : libchromaprint-1.5.1-13.fc39.aarch64 471/591 Installing : duktape-2.7.0-5.fc39.aarch64 472/591 Installing : libproxy-0.5.3-3.fc39.aarch64 473/591 Installing : qt5-qtbase-common-5.15.12-5.fc39.noarch 474/591 Running scriptlet: qt5-qtbase-5.15.12-5.fc39.aarch64 475/591 Installing : qt5-qtbase-5.15.12-5.fc39.aarch64 475/591 Running scriptlet: qt5-qtbase-5.15.12-5.fc39.aarch64 475/591 Installing : qt5-qtbase-gui-5.15.12-5.fc39.aarch64 476/591 Installing : gecode-6.2.0-12.fc39.aarch64 477/591 Installing : mp-3.1.0-42.20200303git7fd4828.fc39.aarch64 478/591 Installing : dbus-common-1:1.14.10-1.fc39.noarch 479/591 Running scriptlet: dbus-common-1:1.14.10-1.fc39.noarch 479/591 Running scriptlet: dbus-broker-35-2.fc39.aarch64 480/591 Installing : dbus-broker-35-2.fc39.aarch64 480/591 Running scriptlet: dbus-broker-35-2.fc39.aarch64 480/591 Installing : dbus-1:1.14.10-1.fc39.aarch64 481/591 Installing : systemd-pam-254.10-1.fc39.aarch64 482/591 Installing : systemd-254.10-1.fc39.aarch64 483/591 Running scriptlet: systemd-254.10-1.fc39.aarch64 483/591 Creating group 'input' with GID 104. Creating group 'kvm' with GID 36. Creating group 'render' with GID 105. Creating group 'sgx' with GID 106. Creating group 'systemd-journal' with GID 190. Creating group 'systemd-oom' with GID 999. Creating user 'systemd-oom' (systemd Userspace OOM Killer) with UID 999 and GID 999. Running scriptlet: samba-common-2:4.19.5-1.fc39.noarch 484/591 Installing : samba-common-2:4.19.5-1.fc39.noarch 484/591 Running scriptlet: samba-common-2:4.19.5-1.fc39.noarch 484/591 Running scriptlet: libwbclient-2:4.19.5-1.fc39.aarch64 485/591 Installing : libwbclient-2:4.19.5-1.fc39.aarch64 485/591 Installing : samba-common-libs-2:4.19.5-1.fc39.aarch64 486/591 Installing : samba-client-libs-2:4.19.5-1.fc39.aarch64 487/591 Installing : libsmbclient-2:4.19.5-1.fc39.aarch64 488/591 Installing : cliquer-libs-1.22-6.fc39.aarch64 489/591 Installing : libnauty-2.8.8-1.fc39.aarch64 490/591 Installing : clang16-resource-filesystem-16.0.6-3.fc39.aarch6 491/591 Installing : clang16-libs-16.0.6-3.fc39.aarch64 492/591 Installing : cjson-1.7.15-2.fc39.aarch64 493/591 Running scriptlet: cjson-1.7.15-2.fc39.aarch64 493/591 Installing : librist-0.2.7-2.fc39.aarch64 494/591 Installing : libavformat-free-6.1.1-3.fc39.aarch64 495/591 Installing : cfitsio-4.3.0-1.fc39.aarch64 496/591 Installing : gdal-libs-3.7.3-4.fc39.aarch64 497/591 Installing : vtk-9.2.6-7.fc39.aarch64 498/591 Installing : cdparanoia-libs-10.2-42.fc39.aarch64 499/591 Installing : gstreamer1-plugins-base-1.22.9-1.fc39.aarch64 500/591 Installing : adobe-mappings-pdf-20190401-5.fc39.noarch 501/591 Installing : libgs-10.02.1-2.fc39.aarch64 502/591 Installing : graphviz-8.1.0-6.fc39.aarch64 503/591 Running scriptlet: graphviz-8.1.0-6.fc39.aarch64 503/591 Installing : MUMPS-common-5.5.1-5.fc39.noarch 504/591 Installing : MUMPS-5.5.1-5.fc39.aarch64 505/591 Installing : coin-or-Cbc-2.10.5-13.fc39.aarch64 506/591 Installing : coin-or-Clp-1.17.6-13.fc39.aarch64 507/591 Installing : coin-or-Cgl-0.60.3-10.fc39.aarch64 508/591 Installing : opencv-4.9.0-20231227.1.cu12_3.fc39.aarch64 509/591 Installing : opencv-contrib-4.9.0-20231227.1.cu12_3.fc39.aarc 510/591 Installing : opencv-cuda-4.9.0-20231227.1.cu12_3.fc39.aarch64 511/591 Installing : opencv-core-4.9.0-20231227.1.cu12_3.fc39.aarch64 512/591 Installing : opencv-static-4.9.0-20231227.1.cu12_3.fc39.aarch 513/591 Installing : opencv-devel-4.9.0-20231227.1.cu12_3.fc39.aarch6 514/591 Installing : cuda-nvvm-12-3-12.3.107-1.aarch64 515/591 Installing : cuda-nvtx-12-3-12.3.101-1.aarch64 516/591 Installing : cuda-driver-devel-12-3-12.3.101-1.aarch64 517/591 Installing : cuda-cupti-12-3-12.3.101-1.aarch64 518/591 Installing : kineto-0.4.0-20240327.0.git445909a8.cu12_3.fc39. 519/591 Installing : cuda-crt-12-3-12.3.107-1.aarch64 520/591 Installing : cuda-nvcc-12-3-12.3.107-1.aarch64 521/591 Installing : cutlass-3.4.1-20240215.0.cu12_3.fc39.aarch64 522/591 Installing : cuda-cccl-12-3-12.3.101-1.aarch64 523/591 Installing : sleef-3.6-20240320.0.git60e76d2b.fc39.aarch64 524/591 Installing : fp16-1:0-20240410.0.git581ac1c7.fc39.aarch64 525/591 Installing : foxi-0-20210526.1.gitc278588e.fc37.aarch64 526/591 Installing : asmjit-1:0-20220702.1.gitc5984762.fc39.aarch64 527/591 Installing : asmjit-devel-1:0-20220702.1.gitc5984762.fc39.aar 528/591 Installing : foxi-devel-0-20210526.1.gitc278588e.fc37.aarch64 529/591 Installing : fp16-devel-1:0-20240410.0.git581ac1c7.fc39.aarch 530/591 Installing : sleef-devel-3.6-20240320.0.git60e76d2b.fc39.aarc 531/591 Installing : cuda-cudart-devel-12-3-12.3.101-1.aarch64 532/591 Installing : cutlass-devel-3.4.1-20240215.0.cu12_3.fc39.aarch 533/591 Installing : kineto-devel-0.4.0-20240327.0.git445909a8.cu12_3 534/591 Installing : doxygen-2:1.9.7-3.fc39.aarch64 535/591 Installing : python3-pybind11-2.11.1-1.fc39.aarch64 536/591 Installing : annobin-plugin-gcc-12.46-1.fc39.aarch64 537/591 Running scriptlet: annobin-plugin-gcc-12.46-1.fc39.aarch64 537/591 Installing : gcc-plugin-annobin-13.2.1-7.fc39.aarch64 538/591 Running scriptlet: gcc-plugin-annobin-13.2.1-7.fc39.aarch64 538/591 Installing : mesa-libGLU-devel-9.0.3-1.fc39.aarch64 539/591 Installing : mpfr-devel-4.2.0-3.fc39.aarch64 540/591 Installing : cuda-gcc-12-c++-12.3.1-1.fc39.aarch64 541/591 Installing : peachpy-python3-0-20221113.1.git349e8f83.fc39.no 542/591 Installing : python3-devel-3.12.2-2.fc39.aarch64 543/591 Installing : onnx-optimizer-devel-0.3.19-20240303.0.gitb3a461 544/591 Installing : python3-numpy-1:1.24.4-2.fc39.aarch64 545/591 Installing : python3-pyyaml-6.0.1-11.fc39.aarch64 546/591 Installing : python3-setuptools-67.7.2-7.fc39.noarch 547/591 Installing : python3-typing-extensions-4.8.0-1.fc39.noarch 548/591 Installing : zeromq-devel-4.3.4-8.fc39.aarch64 549/591 Installing : miniz-devel-3.0.2-3.fc39.aarch64 550/591 Installing : numactl-devel-2.0.16-3.fc39.aarch64 551/591 Installing : protobuf-compat-devel-3.21.9-2.fc39.aarch64 552/591 Installing : rdma-core-devel-46.0-4.fc39.aarch64 553/591 Installing : rocksdb-devel-8.1.1-2.fc39.aarch64 554/591 Installing : ocl-icd-devel-2.3.2-2.fc39.aarch64 555/591 Installing : openblas-devel-0.3.21-6.fc39.aarch64 556/591 Installing : libnvjitlink-devel-12-3-12.3.101-1.aarch64 557/591 Installing : libcusolver-devel-12-3-11.5.4.101-2.aarch64 558/591 Installing : leveldb-devel-1.23-7.fc39.aarch64 559/591 Installing : tensorpipe-devel-0-20220513.1.gitbb1473a4.fc37.a 560/591 Installing : glog-devel-0.3.5-18.fc39.aarch64 561/591 Installing : lmdb-devel-0.9.32-1.fc39.aarch64 562/591 Installing : qnnpack-devel-0-20190828.2.git7d2a4e99.fc38.aarc 563/591 Installing : nnpack-devel-0-20230201.0.git70a77f48.fc38.aarch 564/591 Installing : magma-devel-2.8.0-20240328.0.cu12_3.fc39.aarch64 565/591 Installing : fftw-devel-3.3.10-10.fc39.aarch64 566/591 Installing : gloo-devel-1:0.5.0-20240302.0.git2565674c.cu12_3 567/591 Installing : libnccl-devel-2.21.5-1+cuda12.4.aarch64 568/591 Running scriptlet: libnccl-devel-2.21.5-1+cuda12.4.aarch64 568/591 Installing : flatbuffers-compiler-23.5.26-3.fc39.aarch64 569/591 Installing : flatbuffers-devel-23.5.26-3.fc39.aarch64 570/591 Installing : hiredis-devel-1.0.2-5.fc39.aarch64 571/591 Installing : tbb-devel-2020.3-20.fc39.aarch64 572/591 Installing : libcurand-devel-12-3-10.3.4.107-1.aarch64 573/591 Installing : libcusparse-devel-12-3-12.2.0.103-2.aarch64 574/591 Installing : libcufft-devel-12-3-11.0.12.1-2.aarch64 575/591 Installing : onnx-devel-1.17.0-20240404.0.git4128a090.fc39.aa 576/591 Installing : cpuinfo-devel-1:0-20240327.0.gitf42f5eaf.fc39.aa 577/591 Installing : pthreadpool-devel-1:0.1-20240121.0.git178e3e06.f 578/591 Installing : libcudnn8-devel-8.9.7.29-2.cuda12.3.aarch64 579/591 Running scriptlet: libcudnn8-devel-8.9.7.29-2.cuda12.3.aarch64 579/591 Installing : cuda-nvrtc-devel-12-3-12.3.107-1.aarch64 580/591 Installing : libcublas-devel-12-3-12.3.4.1-2.aarch64 581/591 Installing : snappy-devel-1.1.10-2.fc39.aarch64 582/591 Installing : neon2sse-devel-0-20230131.0.git097a5eca.fc38.noa 583/591 Installing : eigen3-devel-3.4.0-12.fc39.noarch 584/591 Installing : systemd-rpm-macros-254.10-1.fc39.noarch 585/591 Installing : libzstd-devel-1.5.6-1.fc39.aarch64 586/591 Installing : cuda-profiler-api-12-3-12.3.101-1.aarch64 587/591 Installing : cuda-nvml-devel-12-3-12.3.101-1.aarch64 588/591 Installing : psimd-devel-1:0-20200517.2.git072586a7.fc39.noar 589/591 Installing : gemmlowp-devel-0-20231104.0.git16e8662c.fc39.noa 590/591 Installing : fxdiv-devel-1:0-20201208.1.git63058eff.fc39.noar 591/591 Running scriptlet: cuda-toolkit-12-3-config-common-12.3.101-1.noarc 591/591 Running scriptlet: copy-jdk-configs-4.1-3.fc39.noarch 591/591 Running scriptlet: urw-base35-bookman-fonts-20200910-18.fc39.noarch 591/591 Running scriptlet: urw-base35-c059-fonts-20200910-18.fc39.noarch 591/591 Running scriptlet: urw-base35-d050000l-fonts-20200910-18.fc39.noarc 591/591 Running scriptlet: urw-base35-gothic-fonts-20200910-18.fc39.noarch 591/591 Running scriptlet: urw-base35-nimbus-mono-ps-fonts-20200910-18.fc39 591/591 Running scriptlet: urw-base35-nimbus-roman-fonts-20200910-18.fc39.n 591/591 Running scriptlet: urw-base35-nimbus-sans-fonts-20200910-18.fc39.no 591/591 Running scriptlet: urw-base35-p052-fonts-20200910-18.fc39.noarch 591/591 Running scriptlet: urw-base35-standard-symbols-ps-fonts-20200910-18 591/591 Running scriptlet: urw-base35-z003-fonts-20200910-18.fc39.noarch 591/591 Running scriptlet: crypto-policies-scripts-20231204-1.git1e3a2e4.fc 591/591 Running scriptlet: nss-3.98.0-1.fc39.aarch64 591/591 Running scriptlet: java-17-openjdk-headless-1:17.0.9.0.9-3.fc39.aar 591/591 Running scriptlet: fontconfig-2.14.2-6.fc39.aarch64 591/591 Running scriptlet: fxdiv-devel-1:0-20201208.1.git63058eff.fc39.noar 591/591 Verifying : asmjit-1:0-20220702.1.gitc5984762.fc39.aarch64 1/591 Verifying : asmjit-devel-1:0-20220702.1.gitc5984762.fc39.aar 2/591 Verifying : cpuinfo-1:0-20240327.0.gitf42f5eaf.fc39.aarch64 3/591 Verifying : cpuinfo-devel-1:0-20240327.0.gitf42f5eaf.fc39.aa 4/591 Verifying : cuda-gcc-12-12.3.1-1.fc39.aarch64 5/591 Verifying : cuda-gcc-12-c++-12.3.1-1.fc39.aarch64 6/591 Verifying : cutlass-3.4.1-20240215.0.cu12_3.fc39.aarch64 7/591 Verifying : cutlass-devel-3.4.1-20240215.0.cu12_3.fc39.aarch 8/591 Verifying : foxi-0-20210526.1.gitc278588e.fc37.aarch64 9/591 Verifying : foxi-devel-0-20210526.1.gitc278588e.fc37.aarch64 10/591 Verifying : fp16-1:0-20240410.0.git581ac1c7.fc39.aarch64 11/591 Verifying : fp16-devel-1:0-20240410.0.git581ac1c7.fc39.aarch 12/591 Verifying : fxdiv-devel-1:0-20201208.1.git63058eff.fc39.noar 13/591 Verifying : gemmlowp-devel-0-20231104.0.git16e8662c.fc39.noa 14/591 Verifying : gklib-5.1.1-20230326.0.git8bd6bad7.fc39.aarch64 15/591 Verifying : glibc-devel-2.38-99.fc39.aarch64 16/591 Verifying : gloo-1:0.5.0-20240302.0.git2565674c.cu12_3.fc39. 17/591 Verifying : gloo-devel-1:0.5.0-20240302.0.git2565674c.cu12_3 18/591 Verifying : halide-17.0.1-20240220.0.fc39.aarch64 19/591 Verifying : kineto-0.4.0-20240327.0.git445909a8.cu12_3.fc39. 20/591 Verifying : kineto-devel-0.4.0-20240327.0.git445909a8.cu12_3 21/591 Verifying : magma-2.8.0-20240328.0.cu12_3.fc39.aarch64 22/591 Verifying : magma-devel-2.8.0-20240328.0.cu12_3.fc39.aarch64 23/591 Verifying : metis-5.2.1-20230403.0.gite0f1b88b.fc39.aarch64 24/591 Verifying : neon2sse-devel-0-20230131.0.git097a5eca.fc38.noa 25/591 Verifying : nnpack-0-20230201.0.git70a77f48.fc38.aarch64 26/591 Verifying : nnpack-devel-0-20230201.0.git70a77f48.fc38.aarch 27/591 Verifying : onnx-devel-1.17.0-20240404.0.git4128a090.fc39.aa 28/591 Verifying : onnx-libs-1.17.0-20240404.0.git4128a090.fc39.aar 29/591 Verifying : onnx-optimizer-0.3.19-20240303.0.gitb3a46118.fc3 30/591 Verifying : onnx-optimizer-devel-0.3.19-20240303.0.gitb3a461 31/591 Verifying : opencv-4.9.0-20231227.1.cu12_3.fc39.aarch64 32/591 Verifying : opencv-contrib-4.9.0-20231227.1.cu12_3.fc39.aarc 33/591 Verifying : opencv-core-4.9.0-20231227.1.cu12_3.fc39.aarch64 34/591 Verifying : opencv-cuda-4.9.0-20231227.1.cu12_3.fc39.aarch64 35/591 Verifying : opencv-devel-4.9.0-20231227.1.cu12_3.fc39.aarch6 36/591 Verifying : opencv-static-4.9.0-20231227.1.cu12_3.fc39.aarch 37/591 Verifying : peachpy-python3-0-20221113.1.git349e8f83.fc39.no 38/591 Verifying : protobuf-compat-3.21.9-2.fc39.aarch64 39/591 Verifying : protobuf-compat-compiler-3.21.9-2.fc39.aarch64 40/591 Verifying : protobuf-compat-devel-3.21.9-2.fc39.aarch64 41/591 Verifying : psimd-devel-1:0-20200517.2.git072586a7.fc39.noar 42/591 Verifying : pthreadpool-1:0.1-20240121.0.git178e3e06.fc39.aa 43/591 Verifying : pthreadpool-devel-1:0.1-20240121.0.git178e3e06.f 44/591 Verifying : qnnpack-0-20190828.2.git7d2a4e99.fc38.aarch64 45/591 Verifying : qnnpack-devel-0-20190828.2.git7d2a4e99.fc38.aarc 46/591 Verifying : sleef-3.6-20240320.0.git60e76d2b.fc39.aarch64 47/591 Verifying : sleef-devel-3.6-20240320.0.git60e76d2b.fc39.aarc 48/591 Verifying : tensorpipe-0-20220513.1.gitbb1473a4.fc37.aarch64 49/591 Verifying : tensorpipe-devel-0-20220513.1.gitbb1473a4.fc37.a 50/591 Verifying : libcublas-12-3-12.3.4.1-2.aarch64 51/591 Verifying : libcublas-devel-12-3-12.3.4.1-2.aarch64 52/591 Verifying : libcudnn8-8.9.7.29-2.cuda12.3.aarch64 53/591 Verifying : libcudnn8-devel-8.9.7.29-2.cuda12.3.aarch64 54/591 Verifying : libcufft-12-3-11.0.12.1-2.aarch64 55/591 Verifying : libcufft-devel-12-3-11.0.12.1-2.aarch64 56/591 Verifying : libcusolver-12-3-11.5.4.101-2.aarch64 57/591 Verifying : libcusolver-devel-12-3-11.5.4.101-2.aarch64 58/591 Verifying : libcusparse-12-3-12.2.0.103-2.aarch64 59/591 Verifying : libcusparse-devel-12-3-12.2.0.103-2.aarch64 60/591 Verifying : libnpp-12-3-12.2.3.2-2.aarch64 61/591 Verifying : cuda-toolkit-12-3-config-common-12.3.101-1.noarc 62/591 Verifying : cuda-toolkit-12-config-common-12.4.127-1.noarch 63/591 Verifying : cuda-toolkit-config-common-12.4.127-1.noarch 64/591 Verifying : cuda-cccl-12-3-12.3.101-1.aarch64 65/591 Verifying : cuda-crt-12-3-12.3.107-1.aarch64 66/591 Verifying : cuda-cudart-12-3-12.3.101-1.aarch64 67/591 Verifying : cuda-cudart-devel-12-3-12.3.101-1.aarch64 68/591 Verifying : cuda-cupti-12-3-12.3.101-1.aarch64 69/591 Verifying : cuda-driver-devel-12-3-12.3.101-1.aarch64 70/591 Verifying : cuda-nvcc-12-3-12.3.107-1.aarch64 71/591 Verifying : cuda-nvml-devel-12-3-12.3.101-1.aarch64 72/591 Verifying : cuda-nvrtc-12-3-12.3.107-1.aarch64 73/591 Verifying : cuda-nvrtc-devel-12-3-12.3.107-1.aarch64 74/591 Verifying : cuda-nvtx-12-3-12.3.101-1.aarch64 75/591 Verifying : cuda-nvvm-12-3-12.3.107-1.aarch64 76/591 Verifying : cuda-profiler-api-12-3-12.3.101-1.aarch64 77/591 Verifying : libcurand-12-3-10.3.4.107-1.aarch64 78/591 Verifying : libcurand-devel-12-3-10.3.4.107-1.aarch64 79/591 Verifying : libnccl-2.21.5-1+cuda12.4.aarch64 80/591 Verifying : libnccl-devel-2.21.5-1+cuda12.4.aarch64 81/591 Verifying : libnvjitlink-12-3-12.3.101-1.aarch64 82/591 Verifying : libnvjitlink-devel-12-3-12.3.101-1.aarch64 83/591 Verifying : Lmod-8.7.32-1.fc39.aarch64 84/591 Verifying : MUMPS-5.5.1-5.fc39.aarch64 85/591 Verifying : MUMPS-common-5.5.1-5.fc39.noarch 86/591 Verifying : SuperLU-6.0.0-1.fc39.aarch64 87/591 Verifying : abattis-cantarell-vf-fonts-0.301-10.fc39.noarch 88/591 Verifying : adobe-mappings-cmap-20230622-1.fc39.noarch 89/591 Verifying : adobe-mappings-cmap-deprecated-20230622-1.fc39.n 90/591 Verifying : adobe-mappings-pdf-20190401-5.fc39.noarch 91/591 Verifying : avahi-libs-0.8-24.fc39.aarch64 92/591 Verifying : byte-buddy-1.14.2-2.fc39.noarch 93/591 Verifying : byte-buddy-agent-1.14.2-2.fc39.noarch 94/591 Verifying : cairo-1.18.0-1.fc39.aarch64 95/591 Verifying : cairo-gobject-1.18.0-1.fc39.aarch64 96/591 Verifying : cdparanoia-libs-10.2-42.fc39.aarch64 97/591 Verifying : ceres-solver-2.1.0-6.fc39.aarch64 98/591 Verifying : cfitsio-4.3.0-1.fc39.aarch64 99/591 Verifying : cgnslib-libs-4.4.0-2.fc39.aarch64 100/591 Verifying : cjson-1.7.15-2.fc39.aarch64 101/591 Verifying : clang16-libs-16.0.6-3.fc39.aarch64 102/591 Verifying : clang16-resource-filesystem-16.0.6-3.fc39.aarch6 103/591 Verifying : cliquer-libs-1.22-6.fc39.aarch64 104/591 Verifying : cmake-3.27.7-1.fc39.aarch64 105/591 Verifying : cmake-data-3.27.7-1.fc39.noarch 106/591 Verifying : cmake-filesystem-3.27.7-1.fc39.aarch64 107/591 Verifying : cmake-rpm-macros-3.27.7-1.fc39.noarch 108/591 Verifying : codec2-1.2.0-2.fc39.aarch64 109/591 Verifying : coin-or-Cbc-2.10.5-13.fc39.aarch64 110/591 Verifying : coin-or-Cgl-0.60.3-10.fc39.aarch64 111/591 Verifying : coin-or-Clp-1.17.6-13.fc39.aarch64 112/591 Verifying : coin-or-CoinUtils-2.11.4-10.fc39.aarch64 113/591 Verifying : coin-or-Osi-0.108.6-9.fc39.aarch64 114/591 Verifying : copy-jdk-configs-4.1-3.fc39.noarch 115/591 Verifying : dbus-1:1.14.10-1.fc39.aarch64 116/591 Verifying : dbus-common-1:1.14.10-1.fc39.noarch 117/591 Verifying : dbus-libs-1:1.14.10-1.fc39.aarch64 118/591 Verifying : default-fonts-core-sans-4.0-9.fc39.noarch 119/591 Verifying : double-conversion-3.1.5-9.fc39.aarch64 120/591 Verifying : doxygen-2:1.9.7-3.fc39.aarch64 121/591 Verifying : duktape-2.7.0-5.fc39.aarch64 122/591 Verifying : eigen3-devel-3.4.0-12.fc39.noarch 123/591 Verifying : fdk-aac-free-2.0.0-11.fc39.aarch64 124/591 Verifying : flatbuffers-23.5.26-3.fc39.aarch64 125/591 Verifying : flatbuffers-compiler-23.5.26-3.fc39.aarch64 126/591 Verifying : flatbuffers-devel-23.5.26-3.fc39.aarch64 127/591 Verifying : fonts-filesystem-1:2.0.5-12.fc39.noarch 128/591 Verifying : freetype-2.13.1-2.fc39.aarch64 129/591 Verifying : freexl-2.0.0-2.fc39.aarch64 130/591 Verifying : fribidi-1.0.13-2.fc39.aarch64 131/591 Verifying : game-music-emu-0.6.3-12.fc39.aarch64 132/591 Verifying : gc-8.2.2-4.fc39.aarch64 133/591 Verifying : gd-2.3.3-12.fc39.aarch64 134/591 Verifying : gdk-pixbuf2-2.42.10-5.fc39.aarch64 135/591 Verifying : gdk-pixbuf2-modules-2.42.10-5.fc39.aarch64 136/591 Verifying : gecode-6.2.0-12.fc39.aarch64 137/591 Verifying : gflags-2.2.2-12.fc39.aarch64 138/591 Verifying : gflags-devel-2.2.2-12.fc39.aarch64 139/591 Verifying : gl-manpages-1.1-28.20190306.fc39.noarch 140/591 Verifying : glog-0.3.5-18.fc39.aarch64 141/591 Verifying : glog-devel-0.3.5-18.fc39.aarch64 142/591 Verifying : glpk-5.0-7.fc39.aarch64 143/591 Verifying : glx-utils-9.0.0-3.fc39.aarch64 144/591 Verifying : gmp-c++-1:6.2.1-5.fc39.aarch64 145/591 Verifying : gmp-devel-1:6.2.1-5.fc39.aarch64 146/591 Verifying : google-droid-sans-fonts-20200215-17.fc39.noarch 147/591 Verifying : graphene-1.10.6-6.fc39.aarch64 148/591 Verifying : graphite2-1.3.14-12.fc39.aarch64 149/591 Verifying : gsl-2.7.1-5.fc39.aarch64 150/591 Verifying : gsm-1.0.22-3.fc39.aarch64 151/591 Verifying : gts-0.7.6-46.20121130.fc39.aarch64 152/591 Verifying : guile22-2.2.7-9.fc39.aarch64 153/591 Verifying : harfbuzz-8.2.1-2.fc39.aarch64 154/591 Verifying : hdf-libs-4.2.15-13.fc39.aarch64 155/591 Verifying : hdf5-1.12.1-12.fc39.aarch64 156/591 Verifying : hiredis-1.0.2-5.fc39.aarch64 157/591 Verifying : hiredis-devel-1.0.2-5.fc39.aarch64 158/591 Verifying : ilbc-3.0.4-7.fc39.aarch64 159/591 Verifying : infiniband-diags-46.0-4.fc39.aarch64 160/591 Verifying : isl-0.16.1-18.fc39.aarch64 161/591 Verifying : iso-codes-4.15.0-2.fc39.noarch 162/591 Verifying : jacop-4.9.0-2.fc39.noarch 163/591 Verifying : javapackages-filesystem-6.1.0-10.fc39.noarch 164/591 Verifying : javapackages-tools-6.1.0-10.fc39.noarch 165/591 Verifying : jbig2dec-libs-0.19-10.fc39.aarch64 166/591 Verifying : jbigkit-libs-2.1-26.fc39.aarch64 167/591 Verifying : json-c-0.17-1.fc39.aarch64 168/591 Verifying : jsoncpp-1.9.5-5.fc39.aarch64 169/591 Verifying : kmod-libs-30-6.fc39.aarch64 170/591 Verifying : lame-libs-3.100-15.fc39.aarch64 171/591 Verifying : lasi-1.1.3-11.fc39.aarch64 172/591 Verifying : lcms2-2.15-2.fc39.aarch64 173/591 Verifying : less-633-2.fc39.aarch64 174/591 Verifying : leveldb-1.23-7.fc39.aarch64 175/591 Verifying : leveldb-devel-1.23-7.fc39.aarch64 176/591 Verifying : libGLEW-2.2.0-5.fc39.aarch64 177/591 Verifying : libICE-1.0.10-11.fc39.aarch64 178/591 Verifying : libSM-1.2.3-13.fc39.aarch64 179/591 Verifying : libX11-1.8.7-1.fc39.aarch64 180/591 Verifying : libX11-common-1.8.7-1.fc39.noarch 181/591 Verifying : libX11-devel-1.8.7-1.fc39.aarch64 182/591 Verifying : libX11-xcb-1.8.7-1.fc39.aarch64 183/591 Verifying : libXau-1.0.11-3.fc39.aarch64 184/591 Verifying : libXau-devel-1.0.11-3.fc39.aarch64 185/591 Verifying : libXcursor-1.2.1-4.fc39.aarch64 186/591 Verifying : libXext-1.3.5-3.fc39.aarch64 187/591 Verifying : libXfixes-6.0.0-6.fc39.aarch64 188/591 Verifying : libXft-2.3.8-3.fc39.aarch64 189/591 Verifying : libXi-1.8.1-2.fc39.aarch64 190/591 Verifying : libXrender-0.9.11-3.fc39.aarch64 191/591 Verifying : libXt-1.2.1-5.fc39.aarch64 192/591 Verifying : libXv-1.0.11-19.fc39.aarch64 193/591 Verifying : libXxf86vm-1.1.5-3.fc39.aarch64 194/591 Verifying : libavif-0.11.1-11.fc39.aarch64 195/591 Verifying : libb2-0.98.1-9.fc39.aarch64 196/591 Verifying : libbluray-1.3.4-3.fc39.aarch64 197/591 Verifying : libcbor-0.10.2-2.fc39.aarch64 198/591 Verifying : libchromaprint-1.5.1-13.fc39.aarch64 199/591 Verifying : libcom_err-devel-1.47.0-2.fc39.aarch64 200/591 Verifying : libdatrie-0.2.13-7.fc39.aarch64 201/591 Verifying : libdav1d-1.2.1-2.fc39.aarch64 202/591 Verifying : libdc1394-2.2.7-3.fc39.aarch64 203/591 Verifying : libedit-3.1-48.20230828cvs.fc39.aarch64 204/591 Verifying : libevdev-1.13.1-2.fc39.aarch64 205/591 Verifying : libfido2-1.13.0-3.fc39.aarch64 206/591 Verifying : libgcrypt-1.10.2-2.fc39.aarch64 207/591 Verifying : libgeotiff-1.7.1-9.fc39.aarch64 208/591 Verifying : libglvnd-1:1.7.0-1.fc39.aarch64 209/591 Verifying : libglvnd-core-devel-1:1.7.0-1.fc39.aarch64 210/591 Verifying : libglvnd-devel-1:1.7.0-1.fc39.aarch64 211/591 Verifying : libglvnd-egl-1:1.7.0-1.fc39.aarch64 212/591 Verifying : libglvnd-gles-1:1.7.0-1.fc39.aarch64 213/591 Verifying : libglvnd-glx-1:1.7.0-1.fc39.aarch64 214/591 Verifying : libglvnd-opengl-1:1.7.0-1.fc39.aarch64 215/591 Verifying : libgpg-error-1.47-2.fc39.aarch64 216/591 Verifying : libgta-1.2.1-10.fc39.aarch64 217/591 Verifying : libgudev-238-2.fc39.aarch64 218/591 Verifying : libharu-2.4.3-3.fc39.aarch64 219/591 Verifying : libibumad-46.0-4.fc39.aarch64 220/591 Verifying : libibverbs-46.0-4.fc39.aarch64 221/591 Verifying : libicu-73.2-2.fc39.aarch64 222/591 Verifying : libijs-0.35-19.fc39.aarch64 223/591 Verifying : libjpeg-turbo-2.1.4-3.fc39.aarch64 224/591 Verifying : libjxl-1:0.8.2-3.fc39.aarch64 225/591 Verifying : libkml-1.3.0-45.fc39.aarch64 226/591 Verifying : libldb-2.8.0-1.fc39.aarch64 227/591 Verifying : liblerc-4.0.0-4.fc39.aarch64 228/591 Verifying : libmodplug-1:0.8.9.0-17.fc39.aarch64 229/591 Verifying : libmpc-1.3.1-3.fc39.aarch64 230/591 Verifying : libogg-2:1.3.5-6.fc39.aarch64 231/591 Verifying : libpaper-1:2.1.1-1.fc39.aarch64 232/591 Verifying : libpng-2:1.6.37-15.fc39.aarch64 233/591 Verifying : libpq-15.3-1.fc39.aarch64 234/591 Verifying : libqhull_r-1:7.2.1-13.fc39.aarch64 235/591 Verifying : librabbitmq-0.13.0-3.fc39.aarch64 236/591 Verifying : libraw1394-2.1.2-18.fc39.aarch64 237/591 Verifying : librdmacm-46.0-4.fc39.aarch64 238/591 Verifying : librist-0.2.7-2.fc39.aarch64 239/591 Verifying : librttopo-1.1.0-12.fc39.aarch64 240/591 Verifying : libseccomp-2.5.3-6.fc39.aarch64 241/591 Verifying : libselinux-devel-3.5-5.fc39.aarch64 242/591 Verifying : libsepol-devel-3.5-2.fc39.aarch64 243/591 Verifying : libspatialite-5.0.1-23.fc39.aarch64 244/591 Verifying : libtalloc-2.4.1-1.fc39.aarch64 245/591 Verifying : libtdb-1.4.9-1.fc39.aarch64 246/591 Verifying : libtevent-0.15.0-1.fc39.aarch64 247/591 Verifying : libthai-0.1.29-6.fc39.aarch64 248/591 Verifying : libtheora-1:1.1.1-34.fc39.aarch64 249/591 Verifying : libtiff-4.4.0-8.fc39.aarch64 250/591 Verifying : libtool-ltdl-2.4.7-7.fc39.aarch64 251/591 Verifying : libudfread-1.1.2-6.fc39.aarch64 252/591 Verifying : libunwind-1.7.0-0.2.rc2.fc39.aarch64 253/591 Verifying : libunwind-devel-1.7.0-0.2.rc2.fc39.aarch64 254/591 Verifying : libvdpau-1.5-4.fc39.aarch64 255/591 Verifying : libverto-devel-0.3.2-6.fc39.aarch64 256/591 Verifying : libvisual-1:0.4.1-2.fc39.aarch64 257/591 Verifying : libvorbis-1:1.3.7-8.fc39.aarch64 258/591 Verifying : libwayland-client-1.22.0-2.fc39.aarch64 259/591 Verifying : libwayland-cursor-1.22.0-2.fc39.aarch64 260/591 Verifying : libwayland-egl-1.22.0-2.fc39.aarch64 261/591 Verifying : libwayland-server-1.22.0-2.fc39.aarch64 262/591 Verifying : libwebp-1.3.2-2.fc39.aarch64 263/591 Verifying : libxcb-1.13.1-12.fc39.aarch64 264/591 Verifying : libxcb-devel-1.13.1-12.fc39.aarch64 265/591 Verifying : libxcrypt-devel-4.4.36-2.fc39.aarch64 266/591 Verifying : libxshmfence-1.3-13.fc39.aarch64 267/591 Verifying : libyaml-0.2.5-12.fc39.aarch64 268/591 Verifying : lksctp-tools-1.0.19-4.fc39.aarch64 269/591 Verifying : llvm16-libs-16.0.6-5.fc39.aarch64 270/591 Verifying : lpcnetfreedv-0.5-3.fc39.aarch64 271/591 Verifying : lua-5.4.6-3.fc39.aarch64 272/591 Verifying : lua-filesystem-1.8.0-9.fc39.aarch64 273/591 Verifying : lua-json-1.3.4-4.fc39.noarch 274/591 Verifying : lua-lpeg-1.0.2-11.fc39.aarch64 275/591 Verifying : lua-posix-36.2.1-3.fc39.aarch64 276/591 Verifying : lua-term-0.07-18.fc39.aarch64 277/591 Verifying : make-1:4.4.1-2.fc39.aarch64 278/591 Verifying : mesa-libGLU-9.0.3-1.fc39.aarch64 279/591 Verifying : mesa-libGLU-devel-9.0.3-1.fc39.aarch64 280/591 Verifying : miniz-3.0.2-3.fc39.aarch64 281/591 Verifying : miniz-devel-3.0.2-3.fc39.aarch64 282/591 Verifying : mockito-3.12.4-7.fc39.noarch 283/591 Verifying : mp-3.1.0-42.20200303git7fd4828.fc39.aarch64 284/591 Verifying : mpdecimal-2.5.1-7.fc39.aarch64 285/591 Verifying : mpfr-devel-4.2.0-3.fc39.aarch64 286/591 Verifying : mpg123-libs-1.31.3-2.fc39.aarch64 287/591 Verifying : mtdev-1.1.6-6.fc39.aarch64 288/591 Verifying : netcdf-4.9.0-5.fc38.aarch64 289/591 Verifying : netpbm-11.02.00-2.fc39.aarch64 290/591 Verifying : nettle-3.9.1-2.fc39.aarch64 291/591 Verifying : numactl-devel-2.0.16-3.fc39.aarch64 292/591 Verifying : numactl-libs-2.0.16-3.fc39.aarch64 293/591 Verifying : objectweb-asm-9.5-2.fc39.noarch 294/591 Verifying : objenesis-3.3-3.fc39.noarch 295/591 Verifying : ocl-icd-2.3.2-2.fc39.aarch64 296/591 Verifying : ocl-icd-devel-2.3.2-2.fc39.aarch64 297/591 Verifying : ogdi-4.1.0-11.fc39.aarch64 298/591 Verifying : openblas-0.3.21-6.fc39.aarch64 299/591 Verifying : openblas-devel-0.3.21-6.fc39.aarch64 300/591 Verifying : openblas-openmp-0.3.21-6.fc39.aarch64 301/591 Verifying : openblas-openmp64-0.3.21-6.fc39.aarch64 302/591 Verifying : openblas-openmp64_-0.3.21-6.fc39.aarch64 303/591 Verifying : openblas-serial-0.3.21-6.fc39.aarch64 304/591 Verifying : openblas-serial64-0.3.21-6.fc39.aarch64 305/591 Verifying : openblas-serial64_-0.3.21-6.fc39.aarch64 306/591 Verifying : openblas-threads-0.3.21-6.fc39.aarch64 307/591 Verifying : openblas-threads64-0.3.21-6.fc39.aarch64 308/591 Verifying : openblas-threads64_-0.3.21-6.fc39.aarch64 309/591 Verifying : opencore-amr-0.1.6-4.fc39.aarch64 310/591 Verifying : openexr-libs-3.1.10-2.fc39.aarch64 311/591 Verifying : openpgm-5.2.122-32.fc39.aarch64 312/591 Verifying : openpgm-devel-5.2.122-32.fc39.aarch64 313/591 Verifying : openslide-3.4.1-24.fc39.aarch64 314/591 Verifying : opentest4j-1.2.0-14.fc39.noarch 315/591 Verifying : opus-1.3.1-13.fc39.aarch64 316/591 Verifying : orc-0.4.33-3.fc39.aarch64 317/591 Verifying : pango-1.51.0-1.fc39.aarch64 318/591 Verifying : pcre-8.45-1.fc39.4.aarch64 319/591 Verifying : pcre2-devel-10.42-1.fc39.2.aarch64 320/591 Verifying : pcre2-utf16-10.42-1.fc39.2.aarch64 321/591 Verifying : pcre2-utf32-10.42-1.fc39.2.aarch64 322/591 Verifying : perl-Carp-1.54-500.fc39.noarch 323/591 Verifying : perl-Data-Dumper-2.188-501.fc39.aarch64 324/591 Verifying : perl-Digest-1.20-500.fc39.noarch 325/591 Verifying : perl-Digest-MD5-2.58-500.fc39.aarch64 326/591 Verifying : perl-Encode-4:3.19-500.fc39.aarch64 327/591 Verifying : perl-Error-1:0.17029-13.fc39.noarch 328/591 Verifying : perl-Exporter-5.77-500.fc39.noarch 329/591 Verifying : perl-File-Path-2.18-500.fc39.noarch 330/591 Verifying : perl-File-Temp-1:0.231.100-500.fc39.noarch 331/591 Verifying : perl-Getopt-Long-1:2.54-500.fc39.noarch 332/591 Verifying : perl-HTTP-Tiny-0.088-3.fc39.noarch 333/591 Verifying : perl-IO-Socket-IP-0.42-1.fc39.noarch 334/591 Verifying : perl-IO-Socket-SSL-2.083-3.fc39.noarch 335/591 Verifying : perl-MIME-Base64-3.16-500.fc39.aarch64 336/591 Verifying : perl-Mozilla-CA-20230801-1.fc39.noarch 337/591 Verifying : perl-Net-SSLeay-1.92-10.fc39.aarch64 338/591 Verifying : perl-PathTools-3.89-500.fc39.aarch64 339/591 Verifying : perl-Pod-Escapes-1:1.07-500.fc39.noarch 340/591 Verifying : perl-Pod-Perldoc-3.28.01-501.fc39.noarch 341/591 Verifying : perl-Pod-Simple-1:3.45-4.fc39.noarch 342/591 Verifying : perl-Pod-Usage-4:2.03-500.fc39.noarch 343/591 Verifying : perl-Scalar-List-Utils-5:1.63-500.fc39.aarch64 344/591 Verifying : perl-Socket-4:2.037-3.fc39.aarch64 345/591 Verifying : perl-Storable-1:3.32-500.fc39.aarch64 346/591 Verifying : perl-Term-ANSIColor-5.01-501.fc39.noarch 347/591 Verifying : perl-Term-Cap-1.18-500.fc39.noarch 348/591 Verifying : perl-TermReadKey-2.38-18.fc39.aarch64 349/591 Verifying : perl-Text-ParseWords-3.31-500.fc39.noarch 350/591 Verifying : perl-Text-Tabs+Wrap-2023.0511-3.fc39.noarch 351/591 Verifying : perl-Time-Local-2:1.350-3.fc39.noarch 352/591 Verifying : perl-URI-5.21-1.fc39.noarch 353/591 Verifying : perl-constant-1.33-501.fc39.noarch 354/591 Verifying : perl-libnet-3.15-501.fc39.noarch 355/591 Verifying : perl-parent-1:0.241-500.fc39.noarch 356/591 Verifying : perl-podlators-1:5.01-500.fc39.noarch 357/591 Verifying : pixman-0.42.2-2.fc39.aarch64 358/591 Verifying : poppler-23.08.0-1.fc39.aarch64 359/591 Verifying : poppler-data-0.4.11-5.fc39.noarch 360/591 Verifying : poppler-glib-23.08.0-1.fc39.aarch64 361/591 Verifying : proj-9.2.1-2.fc39.aarch64 362/591 Verifying : proj-data-9.2.1-2.fc39.noarch 363/591 Verifying : protobuf-3.19.6-6.fc39.aarch64 364/591 Verifying : pugixml-1.13-3.fc39.aarch64 365/591 Verifying : pybind11-devel-2.11.1-1.fc39.aarch64 366/591 Verifying : python-pip-wheel-23.2.1-1.fc39.noarch 367/591 Verifying : python-rpm-macros-3.12-4.fc39.noarch 368/591 Verifying : python3-numpy-1:1.24.4-2.fc39.aarch64 369/591 Verifying : python3-packaging-23.1-4.fc39.noarch 370/591 Verifying : python3-pybind11-2.11.1-1.fc39.aarch64 371/591 Verifying : python3-pyyaml-6.0.1-11.fc39.aarch64 372/591 Verifying : python3-rpm-generators-14-7.fc39.noarch 373/591 Verifying : python3-rpm-macros-3.12-4.fc39.noarch 374/591 Verifying : python3-setuptools-67.7.2-7.fc39.noarch 375/591 Verifying : python3-six-1.16.0-12.fc39.noarch 376/591 Verifying : python3-typing-extensions-4.8.0-1.fc39.noarch 377/591 Verifying : rdma-core-devel-46.0-4.fc39.aarch64 378/591 Verifying : re2-1:20220601-3.fc39.aarch64 379/591 Verifying : rhash-1.4.3-3.fc39.aarch64 380/591 Verifying : rocksdb-8.1.1-2.fc39.aarch64 381/591 Verifying : rocksdb-devel-8.1.1-2.fc39.aarch64 382/591 Verifying : scotch-7.0.3-3.fc39.aarch64 383/591 Verifying : scotch-devel-7.0.3-3.fc39.aarch64 384/591 Verifying : shared-mime-info-2.2-4.fc39.aarch64 385/591 Verifying : snappy-1.1.10-2.fc39.aarch64 386/591 Verifying : snappy-devel-1.1.10-2.fc39.aarch64 387/591 Verifying : soxr-0.1.3-14.fc39.aarch64 388/591 Verifying : speex-1.2.0-15.fc39.aarch64 389/591 Verifying : srt-libs-1.5.3-1.fc39.aarch64 390/591 Verifying : suitesparse-5.13.0-3.fc39.aarch64 391/591 Verifying : svt-av1-libs-1.4.1-3.fc39.aarch64 392/591 Verifying : tbb-2020.3-20.fc39.aarch64 393/591 Verifying : tbb-devel-2020.3-20.fc39.aarch64 394/591 Verifying : tcl-1:8.6.12-5.fc39.aarch64 395/591 Verifying : twolame-libs-0.4.0-3.fc39.aarch64 396/591 Verifying : unixODBC-2.3.11-4.fc39.aarch64 397/591 Verifying : uriparser-0.9.7-3.fc39.aarch64 398/591 Verifying : urw-base35-bookman-fonts-20200910-18.fc39.noarch 399/591 Verifying : urw-base35-c059-fonts-20200910-18.fc39.noarch 400/591 Verifying : urw-base35-d050000l-fonts-20200910-18.fc39.noarc 401/591 Verifying : urw-base35-fonts-20200910-18.fc39.noarch 402/591 Verifying : urw-base35-fonts-common-20200910-18.fc39.noarch 403/591 Verifying : urw-base35-gothic-fonts-20200910-18.fc39.noarch 404/591 Verifying : urw-base35-nimbus-mono-ps-fonts-20200910-18.fc39 405/591 Verifying : urw-base35-nimbus-roman-fonts-20200910-18.fc39.n 406/591 Verifying : urw-base35-nimbus-sans-fonts-20200910-18.fc39.no 407/591 Verifying : urw-base35-p052-fonts-20200910-18.fc39.noarch 408/591 Verifying : urw-base35-standard-symbols-ps-fonts-20200910-18 409/591 Verifying : urw-base35-z003-fonts-20200910-18.fc39.noarch 410/591 Verifying : utf8proc-2.7.0-5.fc39.aarch64 411/591 Verifying : vapoursynth-libs-63-2.fc39.aarch64 412/591 Verifying : vo-amrwbenc-0.1.3-19.fc39.aarch64 413/591 Verifying : vtk-9.2.6-7.fc39.aarch64 414/591 Verifying : xapian-core-libs-1.4.23-1.fc39.aarch64 415/591 Verifying : xcb-util-0.4.1-3.fc39.aarch64 416/591 Verifying : xcb-util-image-0.4.1-3.fc39.aarch64 417/591 Verifying : xcb-util-keysyms-0.4.1-3.fc39.aarch64 418/591 Verifying : xcb-util-renderutil-0.3.10-3.fc39.aarch64 419/591 Verifying : xcb-util-wm-0.4.2-3.fc39.aarch64 420/591 Verifying : xml-common-0.6.3-61.fc39.noarch 421/591 Verifying : xorg-x11-proto-devel-2023.2-2.fc39.noarch 422/591 Verifying : xvidcore-1.3.7-10.fc39.aarch64 423/591 Verifying : zeromq-4.3.4-8.fc39.aarch64 424/591 Verifying : zeromq-devel-4.3.4-8.fc39.aarch64 425/591 Verifying : zlib-devel-1.2.13-4.fc39.aarch64 426/591 Verifying : zvbi-0.2.35-21.fc39.aarch64 427/591 Verifying : alsa-lib-1.2.11-2.fc39.aarch64 428/591 Verifying : annobin-docs-12.46-1.fc39.noarch 429/591 Verifying : annobin-plugin-gcc-12.46-1.fc39.aarch64 430/591 Verifying : armadillo-12.8.1-1.fc39.aarch64 431/591 Verifying : arpack-3.9.1-1.fc39.aarch64 432/591 Verifying : blosc-1.21.5-2.fc39.aarch64 433/591 Verifying : cpp-13.2.1-7.fc39.aarch64 434/591 Verifying : crypto-policies-scripts-20231204-1.git1e3a2e4.fc 435/591 Verifying : cups-libs-1:2.4.7-11.fc39.aarch64 436/591 Verifying : dbus-broker-35-2.fc39.aarch64 437/591 Verifying : emacs-filesystem-1:29.3-1.fc39.noarch 438/591 Verifying : expat-2.6.2-1.fc39.aarch64 439/591 Verifying : fftw-3.3.10-10.fc39.aarch64 440/591 Verifying : fftw-devel-3.3.10-10.fc39.aarch64 441/591 Verifying : fftw-libs-3.3.10-10.fc39.aarch64 442/591 Verifying : fftw-libs-double-3.3.10-10.fc39.aarch64 443/591 Verifying : fftw-libs-long-3.3.10-10.fc39.aarch64 444/591 Verifying : fftw-libs-single-3.3.10-10.fc39.aarch64 445/591 Verifying : flexiblas-3.4.2-1.fc39.aarch64 446/591 Verifying : flexiblas-netlib-3.4.2-1.fc39.aarch64 447/591 Verifying : flexiblas-netlib64-3.4.2-1.fc39.aarch64 448/591 Verifying : flexiblas-openblas-openmp-3.4.2-1.fc39.aarch64 449/591 Verifying : flexiblas-openblas-openmp64-3.4.2-1.fc39.aarch64 450/591 Verifying : fontconfig-2.14.2-6.fc39.aarch64 451/591 Verifying : gcc-13.2.1-7.fc39.aarch64 452/591 Verifying : gcc-c++-13.2.1-7.fc39.aarch64 453/591 Verifying : gcc-plugin-annobin-13.2.1-7.fc39.aarch64 454/591 Verifying : gdal-libs-3.7.3-4.fc39.aarch64 455/591 Verifying : geos-3.12.1-1.fc39.aarch64 456/591 Verifying : giflib-5.2.2-1.fc39.aarch64 457/591 Verifying : git-2.44.0-1.fc39.aarch64 458/591 Verifying : git-core-2.44.0-1.fc39.aarch64 459/591 Verifying : git-core-doc-2.44.0-1.fc39.noarch 460/591 Verifying : glib2-2.78.3-1.fc39.aarch64 461/591 Verifying : gnutls-3.8.4-1.fc39.aarch64 462/591 Verifying : google-noto-fonts-common-20240101-1.fc39.noarch 463/591 Verifying : google-noto-sans-vf-fonts-20240101-1.fc39.noarch 464/591 Verifying : graphviz-8.1.0-6.fc39.aarch64 465/591 Verifying : groff-base-1.23.0-3.fc39.aarch64 466/591 Verifying : gstreamer1-1.22.9-1.fc39.aarch64 467/591 Verifying : gstreamer1-plugins-base-1.22.9-1.fc39.aarch64 468/591 Verifying : highway-1.1.0-1.fc39.aarch64 469/591 Verifying : imath-3.1.10-1.fc39.aarch64 470/591 Verifying : java-17-openjdk-headless-1:17.0.9.0.9-3.fc39.aar 471/591 Verifying : kernel-headers-6.8.3-200.fc39.aarch64 472/591 Verifying : keyutils-libs-devel-1.6.3-1.fc39.aarch64 473/591 Verifying : krb5-devel-1.21.2-3.fc39.aarch64 474/591 Verifying : libXpm-3.5.17-1.fc39.aarch64 475/591 Verifying : libaec-1.1.2-1.fc39.aarch64 476/591 Verifying : libaom-3.8.2-1.fc39.aarch64 477/591 Verifying : libarrow-13.0.0-4.fc39.aarch64 478/591 Verifying : libarrow-doc-13.0.0-4.fc39.noarch 479/591 Verifying : libasan-13.2.1-7.fc39.aarch64 480/591 Verifying : libatomic-13.2.1-7.fc39.aarch64 481/591 Verifying : libavcodec-free-6.1.1-3.fc39.aarch64 482/591 Verifying : libavformat-free-6.1.1-3.fc39.aarch64 483/591 Verifying : libavutil-free-6.1.1-3.fc39.aarch64 484/591 Verifying : libdeflate-1.20-1.fc39.aarch64 485/591 Verifying : libdrm-2.4.120-1.fc39.aarch64 486/591 Verifying : libgfortran-13.2.1-7.fc39.aarch64 487/591 Verifying : libgs-10.02.1-2.fc39.aarch64 488/591 Verifying : libimagequant-4.0.3-2.fc39.aarch64 489/591 Verifying : libinput-1.25.0-4.fc39.aarch64 490/591 Verifying : libkadm5-1.21.2-3.fc39.aarch64 491/591 Verifying : libnauty-2.8.8-1.fc39.aarch64 492/591 Verifying : libnl3-3.9.0-1.fc39.aarch64 493/591 Verifying : libopenmpt-0.6.12-1.fc39.aarch64 494/591 Verifying : liborc1-1.9.3-1.fc39.aarch64 495/591 Verifying : libproxy-0.5.3-3.fc39.aarch64 496/591 Verifying : librsvg2-2.57.1-1.fc39.aarch64 497/591 Verifying : libsmbclient-2:4.19.5-1.fc39.aarch64 498/591 Verifying : libsodium-1.0.18-15.fc39.aarch64 499/591 Verifying : libsodium-devel-1.0.18-15.fc39.aarch64 500/591 Verifying : libstdc++-devel-13.2.1-7.fc39.aarch64 501/591 Verifying : libswresample-free-6.1.1-3.fc39.aarch64 502/591 Verifying : libswscale-free-6.1.1-3.fc39.aarch64 503/591 Verifying : libubsan-13.2.1-7.fc39.aarch64 504/591 Verifying : liburing-2.5-1.fc39.aarch64 505/591 Verifying : libusb1-1.0.27-1.fc39.aarch64 506/591 Verifying : libuv-1:1.48.0-1.fc39.aarch64 507/591 Verifying : libuv-devel-1:1.48.0-1.fc39.aarch64 508/591 Verifying : libuv-static-1:1.48.0-1.fc39.aarch64 509/591 Verifying : libva-2.20.0-2.fc39.aarch64 510/591 Verifying : libvpx-1.13.1-1.fc39.aarch64 511/591 Verifying : libwacom-2.10.0-1.fc39.aarch64 512/591 Verifying : libwacom-data-2.10.0-1.fc39.noarch 513/591 Verifying : libwbclient-2:4.19.5-1.fc39.aarch64 514/591 Verifying : libxkbcommon-1.6.0-1.fc39.aarch64 515/591 Verifying : libxkbcommon-x11-1.6.0-1.fc39.aarch64 516/591 Verifying : libzstd-devel-1.5.6-1.fc39.aarch64 517/591 Verifying : llvm-libs-17.0.6-3.fc39.aarch64 518/591 Verifying : lmdb-0.9.32-1.fc39.aarch64 519/591 Verifying : lmdb-devel-0.9.32-1.fc39.aarch64 520/591 Verifying : lmdb-libs-0.9.32-1.fc39.aarch64 521/591 Verifying : mariadb-connector-c-3.3.8-1.fc39.aarch64 522/591 Verifying : mariadb-connector-c-config-3.3.8-1.fc39.noarch 523/591 Verifying : mbedtls-2.28.7-1.fc39.aarch64 524/591 Verifying : mesa-filesystem-23.3.6-1.fc39.aarch64 525/591 Verifying : mesa-libEGL-23.3.6-1.fc39.aarch64 526/591 Verifying : mesa-libGL-23.3.6-1.fc39.aarch64 527/591 Verifying : mesa-libgbm-23.3.6-1.fc39.aarch64 528/591 Verifying : mesa-libglapi-23.3.6-1.fc39.aarch64 529/591 Verifying : minizip-ng-3.0.7-5.fc39.aarch64 530/591 Verifying : ncurses-6.4-7.20230520.fc39.1.aarch64 531/591 Verifying : nspr-4.35.0-18.fc39.aarch64 532/591 Verifying : nss-3.98.0-1.fc39.aarch64 533/591 Verifying : nss-softokn-3.98.0-1.fc39.aarch64 534/591 Verifying : nss-softokn-freebl-3.98.0-1.fc39.aarch64 535/591 Verifying : nss-sysinit-3.98.0-1.fc39.aarch64 536/591 Verifying : nss-util-3.98.0-1.fc39.aarch64 537/591 Verifying : opencl-headers-3.0-19.20231212git2368105.fc39.no 538/591 Verifying : openjpeg2-2.5.2-1.fc39.aarch64 539/591 Verifying : openssh-9.3p1-10.fc39.aarch64 540/591 Verifying : openssh-clients-9.3p1-10.fc39.aarch64 541/591 Verifying : perl-AutoLoader-5.74-502.fc39.noarch 542/591 Verifying : perl-B-1.88-502.fc39.aarch64 543/591 Verifying : perl-Class-Struct-0.68-502.fc39.noarch 544/591 Verifying : perl-DynaLoader-1.54-502.fc39.aarch64 545/591 Verifying : perl-Errno-1.37-502.fc39.aarch64 546/591 Verifying : perl-Fcntl-1.15-502.fc39.aarch64 547/591 Verifying : perl-File-Basename-2.86-502.fc39.noarch 548/591 Verifying : perl-File-Find-1.43-502.fc39.noarch 549/591 Verifying : perl-File-stat-1.13-502.fc39.noarch 550/591 Verifying : perl-FileHandle-2.05-502.fc39.noarch 551/591 Verifying : perl-Getopt-Std-1.13-502.fc39.noarch 552/591 Verifying : perl-Git-2.44.0-1.fc39.noarch 553/591 Verifying : perl-IO-1.52-502.fc39.aarch64 554/591 Verifying : perl-IPC-Open3-1.22-502.fc39.noarch 555/591 Verifying : perl-POSIX-2.13-502.fc39.aarch64 556/591 Verifying : perl-SelectSaver-1.02-502.fc39.noarch 557/591 Verifying : perl-Symbol-1.09-502.fc39.noarch 558/591 Verifying : perl-base-2.27-502.fc39.noarch 559/591 Verifying : perl-if-0.61.000-502.fc39.noarch 560/591 Verifying : perl-interpreter-4:5.38.2-502.fc39.aarch64 561/591 Verifying : perl-lib-0.65-502.fc39.aarch64 562/591 Verifying : perl-libs-4:5.38.2-502.fc39.aarch64 563/591 Verifying : perl-locale-1.10-502.fc39.noarch 564/591 Verifying : perl-mro-1.28-502.fc39.aarch64 565/591 Verifying : perl-overload-1.37-502.fc39.noarch 566/591 Verifying : perl-overloading-0.02-502.fc39.noarch 567/591 Verifying : perl-vars-1.05-502.fc39.noarch 568/591 Verifying : procps-ng-4.0.3-5.fc39.aarch64 569/591 Verifying : pyproject-rpm-macros-1.12.0-1.fc39.noarch 570/591 Verifying : python3-3.12.2-2.fc39.aarch64 571/591 Verifying : python3-devel-3.12.2-2.fc39.aarch64 572/591 Verifying : python3-libs-3.12.2-2.fc39.aarch64 573/591 Verifying : qt-settings-39.1-1.fc39.noarch 574/591 Verifying : qt5-qtbase-5.15.12-5.fc39.aarch64 575/591 Verifying : qt5-qtbase-common-5.15.12-5.fc39.noarch 576/591 Verifying : qt5-qtbase-gui-5.15.12-5.fc39.aarch64 577/591 Verifying : rav1e-libs-0.7.1-1.fc39.aarch64 578/591 Verifying : rsvg-pixbuf-loader-2.57.1-1.fc39.aarch64 579/591 Verifying : samba-client-libs-2:4.19.5-1.fc39.aarch64 580/591 Verifying : samba-common-2:4.19.5-1.fc39.noarch 581/591 Verifying : samba-common-libs-2:4.19.5-1.fc39.aarch64 582/591 Verifying : systemd-254.10-1.fc39.aarch64 583/591 Verifying : systemd-pam-254.10-1.fc39.aarch64 584/591 Verifying : systemd-rpm-macros-254.10-1.fc39.noarch 585/591 Verifying : tzdata-2024a-2.fc39.noarch 586/591 Verifying : tzdata-java-2024a-2.fc39.noarch 587/591 Verifying : vim-filesystem-2:9.1.264-1.fc39.noarch 588/591 Verifying : xerces-c-3.2.5-1.fc39.aarch64 589/591 Verifying : xkeyboard-config-2.40-1.fc39.noarch 590/591 Verifying : zimg-3.0.5-1.fc39.aarch64 591/591 Installed: Lmod-8.7.32-1.fc39.aarch64 MUMPS-5.5.1-5.fc39.aarch64 MUMPS-common-5.5.1-5.fc39.noarch SuperLU-6.0.0-1.fc39.aarch64 abattis-cantarell-vf-fonts-0.301-10.fc39.noarch adobe-mappings-cmap-20230622-1.fc39.noarch adobe-mappings-cmap-deprecated-20230622-1.fc39.noarch adobe-mappings-pdf-20190401-5.fc39.noarch alsa-lib-1.2.11-2.fc39.aarch64 annobin-docs-12.46-1.fc39.noarch annobin-plugin-gcc-12.46-1.fc39.aarch64 armadillo-12.8.1-1.fc39.aarch64 arpack-3.9.1-1.fc39.aarch64 asmjit-1:0-20220702.1.gitc5984762.fc39.aarch64 asmjit-devel-1:0-20220702.1.gitc5984762.fc39.aarch64 avahi-libs-0.8-24.fc39.aarch64 blosc-1.21.5-2.fc39.aarch64 byte-buddy-1.14.2-2.fc39.noarch byte-buddy-agent-1.14.2-2.fc39.noarch cairo-1.18.0-1.fc39.aarch64 cairo-gobject-1.18.0-1.fc39.aarch64 cdparanoia-libs-10.2-42.fc39.aarch64 ceres-solver-2.1.0-6.fc39.aarch64 cfitsio-4.3.0-1.fc39.aarch64 cgnslib-libs-4.4.0-2.fc39.aarch64 cjson-1.7.15-2.fc39.aarch64 clang16-libs-16.0.6-3.fc39.aarch64 clang16-resource-filesystem-16.0.6-3.fc39.aarch64 cliquer-libs-1.22-6.fc39.aarch64 cmake-3.27.7-1.fc39.aarch64 cmake-data-3.27.7-1.fc39.noarch cmake-filesystem-3.27.7-1.fc39.aarch64 cmake-rpm-macros-3.27.7-1.fc39.noarch codec2-1.2.0-2.fc39.aarch64 coin-or-Cbc-2.10.5-13.fc39.aarch64 coin-or-Cgl-0.60.3-10.fc39.aarch64 coin-or-Clp-1.17.6-13.fc39.aarch64 coin-or-CoinUtils-2.11.4-10.fc39.aarch64 coin-or-Osi-0.108.6-9.fc39.aarch64 copy-jdk-configs-4.1-3.fc39.noarch cpp-13.2.1-7.fc39.aarch64 cpuinfo-1:0-20240327.0.gitf42f5eaf.fc39.aarch64 cpuinfo-devel-1:0-20240327.0.gitf42f5eaf.fc39.aarch64 crypto-policies-scripts-20231204-1.git1e3a2e4.fc39.noarch cuda-cccl-12-3-12.3.101-1.aarch64 cuda-crt-12-3-12.3.107-1.aarch64 cuda-cudart-12-3-12.3.101-1.aarch64 cuda-cudart-devel-12-3-12.3.101-1.aarch64 cuda-cupti-12-3-12.3.101-1.aarch64 cuda-driver-devel-12-3-12.3.101-1.aarch64 cuda-gcc-12-12.3.1-1.fc39.aarch64 cuda-gcc-12-c++-12.3.1-1.fc39.aarch64 cuda-nvcc-12-3-12.3.107-1.aarch64 cuda-nvml-devel-12-3-12.3.101-1.aarch64 cuda-nvrtc-12-3-12.3.107-1.aarch64 cuda-nvrtc-devel-12-3-12.3.107-1.aarch64 cuda-nvtx-12-3-12.3.101-1.aarch64 cuda-nvvm-12-3-12.3.107-1.aarch64 cuda-profiler-api-12-3-12.3.101-1.aarch64 cuda-toolkit-12-3-config-common-12.3.101-1.noarch cuda-toolkit-12-config-common-12.4.127-1.noarch cuda-toolkit-config-common-12.4.127-1.noarch cups-libs-1:2.4.7-11.fc39.aarch64 cutlass-3.4.1-20240215.0.cu12_3.fc39.aarch64 cutlass-devel-3.4.1-20240215.0.cu12_3.fc39.aarch64 dbus-1:1.14.10-1.fc39.aarch64 dbus-broker-35-2.fc39.aarch64 dbus-common-1:1.14.10-1.fc39.noarch dbus-libs-1:1.14.10-1.fc39.aarch64 default-fonts-core-sans-4.0-9.fc39.noarch double-conversion-3.1.5-9.fc39.aarch64 doxygen-2:1.9.7-3.fc39.aarch64 duktape-2.7.0-5.fc39.aarch64 eigen3-devel-3.4.0-12.fc39.noarch emacs-filesystem-1:29.3-1.fc39.noarch expat-2.6.2-1.fc39.aarch64 fdk-aac-free-2.0.0-11.fc39.aarch64 fftw-3.3.10-10.fc39.aarch64 fftw-devel-3.3.10-10.fc39.aarch64 fftw-libs-3.3.10-10.fc39.aarch64 fftw-libs-double-3.3.10-10.fc39.aarch64 fftw-libs-long-3.3.10-10.fc39.aarch64 fftw-libs-single-3.3.10-10.fc39.aarch64 flatbuffers-23.5.26-3.fc39.aarch64 flatbuffers-compiler-23.5.26-3.fc39.aarch64 flatbuffers-devel-23.5.26-3.fc39.aarch64 flexiblas-3.4.2-1.fc39.aarch64 flexiblas-netlib-3.4.2-1.fc39.aarch64 flexiblas-netlib64-3.4.2-1.fc39.aarch64 flexiblas-openblas-openmp-3.4.2-1.fc39.aarch64 flexiblas-openblas-openmp64-3.4.2-1.fc39.aarch64 fontconfig-2.14.2-6.fc39.aarch64 fonts-filesystem-1:2.0.5-12.fc39.noarch foxi-0-20210526.1.gitc278588e.fc37.aarch64 foxi-devel-0-20210526.1.gitc278588e.fc37.aarch64 fp16-1:0-20240410.0.git581ac1c7.fc39.aarch64 fp16-devel-1:0-20240410.0.git581ac1c7.fc39.aarch64 freetype-2.13.1-2.fc39.aarch64 freexl-2.0.0-2.fc39.aarch64 fribidi-1.0.13-2.fc39.aarch64 fxdiv-devel-1:0-20201208.1.git63058eff.fc39.noarch game-music-emu-0.6.3-12.fc39.aarch64 gc-8.2.2-4.fc39.aarch64 gcc-13.2.1-7.fc39.aarch64 gcc-c++-13.2.1-7.fc39.aarch64 gcc-plugin-annobin-13.2.1-7.fc39.aarch64 gd-2.3.3-12.fc39.aarch64 gdal-libs-3.7.3-4.fc39.aarch64 gdk-pixbuf2-2.42.10-5.fc39.aarch64 gdk-pixbuf2-modules-2.42.10-5.fc39.aarch64 gecode-6.2.0-12.fc39.aarch64 gemmlowp-devel-0-20231104.0.git16e8662c.fc39.noarch geos-3.12.1-1.fc39.aarch64 gflags-2.2.2-12.fc39.aarch64 gflags-devel-2.2.2-12.fc39.aarch64 giflib-5.2.2-1.fc39.aarch64 git-2.44.0-1.fc39.aarch64 git-core-2.44.0-1.fc39.aarch64 git-core-doc-2.44.0-1.fc39.noarch gklib-5.1.1-20230326.0.git8bd6bad7.fc39.aarch64 gl-manpages-1.1-28.20190306.fc39.noarch glib2-2.78.3-1.fc39.aarch64 glibc-devel-2.38-99.fc39.aarch64 glog-0.3.5-18.fc39.aarch64 glog-devel-0.3.5-18.fc39.aarch64 gloo-1:0.5.0-20240302.0.git2565674c.cu12_3.fc39.aarch64 gloo-devel-1:0.5.0-20240302.0.git2565674c.cu12_3.fc39.aarch64 glpk-5.0-7.fc39.aarch64 glx-utils-9.0.0-3.fc39.aarch64 gmp-c++-1:6.2.1-5.fc39.aarch64 gmp-devel-1:6.2.1-5.fc39.aarch64 gnutls-3.8.4-1.fc39.aarch64 google-droid-sans-fonts-20200215-17.fc39.noarch google-noto-fonts-common-20240101-1.fc39.noarch google-noto-sans-vf-fonts-20240101-1.fc39.noarch graphene-1.10.6-6.fc39.aarch64 graphite2-1.3.14-12.fc39.aarch64 graphviz-8.1.0-6.fc39.aarch64 groff-base-1.23.0-3.fc39.aarch64 gsl-2.7.1-5.fc39.aarch64 gsm-1.0.22-3.fc39.aarch64 gstreamer1-1.22.9-1.fc39.aarch64 gstreamer1-plugins-base-1.22.9-1.fc39.aarch64 gts-0.7.6-46.20121130.fc39.aarch64 guile22-2.2.7-9.fc39.aarch64 halide-17.0.1-20240220.0.fc39.aarch64 harfbuzz-8.2.1-2.fc39.aarch64 hdf-libs-4.2.15-13.fc39.aarch64 hdf5-1.12.1-12.fc39.aarch64 highway-1.1.0-1.fc39.aarch64 hiredis-1.0.2-5.fc39.aarch64 hiredis-devel-1.0.2-5.fc39.aarch64 ilbc-3.0.4-7.fc39.aarch64 imath-3.1.10-1.fc39.aarch64 infiniband-diags-46.0-4.fc39.aarch64 isl-0.16.1-18.fc39.aarch64 iso-codes-4.15.0-2.fc39.noarch jacop-4.9.0-2.fc39.noarch java-17-openjdk-headless-1:17.0.9.0.9-3.fc39.aarch64 javapackages-filesystem-6.1.0-10.fc39.noarch javapackages-tools-6.1.0-10.fc39.noarch jbig2dec-libs-0.19-10.fc39.aarch64 jbigkit-libs-2.1-26.fc39.aarch64 json-c-0.17-1.fc39.aarch64 jsoncpp-1.9.5-5.fc39.aarch64 kernel-headers-6.8.3-200.fc39.aarch64 keyutils-libs-devel-1.6.3-1.fc39.aarch64 kineto-0.4.0-20240327.0.git445909a8.cu12_3.fc39.aarch64 kineto-devel-0.4.0-20240327.0.git445909a8.cu12_3.fc39.aarch64 kmod-libs-30-6.fc39.aarch64 krb5-devel-1.21.2-3.fc39.aarch64 lame-libs-3.100-15.fc39.aarch64 lasi-1.1.3-11.fc39.aarch64 lcms2-2.15-2.fc39.aarch64 less-633-2.fc39.aarch64 leveldb-1.23-7.fc39.aarch64 leveldb-devel-1.23-7.fc39.aarch64 libGLEW-2.2.0-5.fc39.aarch64 libICE-1.0.10-11.fc39.aarch64 libSM-1.2.3-13.fc39.aarch64 libX11-1.8.7-1.fc39.aarch64 libX11-common-1.8.7-1.fc39.noarch libX11-devel-1.8.7-1.fc39.aarch64 libX11-xcb-1.8.7-1.fc39.aarch64 libXau-1.0.11-3.fc39.aarch64 libXau-devel-1.0.11-3.fc39.aarch64 libXcursor-1.2.1-4.fc39.aarch64 libXext-1.3.5-3.fc39.aarch64 libXfixes-6.0.0-6.fc39.aarch64 libXft-2.3.8-3.fc39.aarch64 libXi-1.8.1-2.fc39.aarch64 libXpm-3.5.17-1.fc39.aarch64 libXrender-0.9.11-3.fc39.aarch64 libXt-1.2.1-5.fc39.aarch64 libXv-1.0.11-19.fc39.aarch64 libXxf86vm-1.1.5-3.fc39.aarch64 libaec-1.1.2-1.fc39.aarch64 libaom-3.8.2-1.fc39.aarch64 libarrow-13.0.0-4.fc39.aarch64 libarrow-doc-13.0.0-4.fc39.noarch libasan-13.2.1-7.fc39.aarch64 libatomic-13.2.1-7.fc39.aarch64 libavcodec-free-6.1.1-3.fc39.aarch64 libavformat-free-6.1.1-3.fc39.aarch64 libavif-0.11.1-11.fc39.aarch64 libavutil-free-6.1.1-3.fc39.aarch64 libb2-0.98.1-9.fc39.aarch64 libbluray-1.3.4-3.fc39.aarch64 libcbor-0.10.2-2.fc39.aarch64 libchromaprint-1.5.1-13.fc39.aarch64 libcom_err-devel-1.47.0-2.fc39.aarch64 libcublas-12-3-12.3.4.1-2.aarch64 libcublas-devel-12-3-12.3.4.1-2.aarch64 libcudnn8-8.9.7.29-2.cuda12.3.aarch64 libcudnn8-devel-8.9.7.29-2.cuda12.3.aarch64 libcufft-12-3-11.0.12.1-2.aarch64 libcufft-devel-12-3-11.0.12.1-2.aarch64 libcurand-12-3-10.3.4.107-1.aarch64 libcurand-devel-12-3-10.3.4.107-1.aarch64 libcusolver-12-3-11.5.4.101-2.aarch64 libcusolver-devel-12-3-11.5.4.101-2.aarch64 libcusparse-12-3-12.2.0.103-2.aarch64 libcusparse-devel-12-3-12.2.0.103-2.aarch64 libdatrie-0.2.13-7.fc39.aarch64 libdav1d-1.2.1-2.fc39.aarch64 libdc1394-2.2.7-3.fc39.aarch64 libdeflate-1.20-1.fc39.aarch64 libdrm-2.4.120-1.fc39.aarch64 libedit-3.1-48.20230828cvs.fc39.aarch64 libevdev-1.13.1-2.fc39.aarch64 libfido2-1.13.0-3.fc39.aarch64 libgcrypt-1.10.2-2.fc39.aarch64 libgeotiff-1.7.1-9.fc39.aarch64 libgfortran-13.2.1-7.fc39.aarch64 libglvnd-1:1.7.0-1.fc39.aarch64 libglvnd-core-devel-1:1.7.0-1.fc39.aarch64 libglvnd-devel-1:1.7.0-1.fc39.aarch64 libglvnd-egl-1:1.7.0-1.fc39.aarch64 libglvnd-gles-1:1.7.0-1.fc39.aarch64 libglvnd-glx-1:1.7.0-1.fc39.aarch64 libglvnd-opengl-1:1.7.0-1.fc39.aarch64 libgpg-error-1.47-2.fc39.aarch64 libgs-10.02.1-2.fc39.aarch64 libgta-1.2.1-10.fc39.aarch64 libgudev-238-2.fc39.aarch64 libharu-2.4.3-3.fc39.aarch64 libibumad-46.0-4.fc39.aarch64 libibverbs-46.0-4.fc39.aarch64 libicu-73.2-2.fc39.aarch64 libijs-0.35-19.fc39.aarch64 libimagequant-4.0.3-2.fc39.aarch64 libinput-1.25.0-4.fc39.aarch64 libjpeg-turbo-2.1.4-3.fc39.aarch64 libjxl-1:0.8.2-3.fc39.aarch64 libkadm5-1.21.2-3.fc39.aarch64 libkml-1.3.0-45.fc39.aarch64 libldb-2.8.0-1.fc39.aarch64 liblerc-4.0.0-4.fc39.aarch64 libmodplug-1:0.8.9.0-17.fc39.aarch64 libmpc-1.3.1-3.fc39.aarch64 libnauty-2.8.8-1.fc39.aarch64 libnccl-2.21.5-1+cuda12.4.aarch64 libnccl-devel-2.21.5-1+cuda12.4.aarch64 libnl3-3.9.0-1.fc39.aarch64 libnpp-12-3-12.2.3.2-2.aarch64 libnvjitlink-12-3-12.3.101-1.aarch64 libnvjitlink-devel-12-3-12.3.101-1.aarch64 libogg-2:1.3.5-6.fc39.aarch64 libopenmpt-0.6.12-1.fc39.aarch64 liborc1-1.9.3-1.fc39.aarch64 libpaper-1:2.1.1-1.fc39.aarch64 libpng-2:1.6.37-15.fc39.aarch64 libpq-15.3-1.fc39.aarch64 libproxy-0.5.3-3.fc39.aarch64 libqhull_r-1:7.2.1-13.fc39.aarch64 librabbitmq-0.13.0-3.fc39.aarch64 libraw1394-2.1.2-18.fc39.aarch64 librdmacm-46.0-4.fc39.aarch64 librist-0.2.7-2.fc39.aarch64 librsvg2-2.57.1-1.fc39.aarch64 librttopo-1.1.0-12.fc39.aarch64 libseccomp-2.5.3-6.fc39.aarch64 libselinux-devel-3.5-5.fc39.aarch64 libsepol-devel-3.5-2.fc39.aarch64 libsmbclient-2:4.19.5-1.fc39.aarch64 libsodium-1.0.18-15.fc39.aarch64 libsodium-devel-1.0.18-15.fc39.aarch64 libspatialite-5.0.1-23.fc39.aarch64 libstdc++-devel-13.2.1-7.fc39.aarch64 libswresample-free-6.1.1-3.fc39.aarch64 libswscale-free-6.1.1-3.fc39.aarch64 libtalloc-2.4.1-1.fc39.aarch64 libtdb-1.4.9-1.fc39.aarch64 libtevent-0.15.0-1.fc39.aarch64 libthai-0.1.29-6.fc39.aarch64 libtheora-1:1.1.1-34.fc39.aarch64 libtiff-4.4.0-8.fc39.aarch64 libtool-ltdl-2.4.7-7.fc39.aarch64 libubsan-13.2.1-7.fc39.aarch64 libudfread-1.1.2-6.fc39.aarch64 libunwind-1.7.0-0.2.rc2.fc39.aarch64 libunwind-devel-1.7.0-0.2.rc2.fc39.aarch64 liburing-2.5-1.fc39.aarch64 libusb1-1.0.27-1.fc39.aarch64 libuv-1:1.48.0-1.fc39.aarch64 libuv-devel-1:1.48.0-1.fc39.aarch64 libuv-static-1:1.48.0-1.fc39.aarch64 libva-2.20.0-2.fc39.aarch64 libvdpau-1.5-4.fc39.aarch64 libverto-devel-0.3.2-6.fc39.aarch64 libvisual-1:0.4.1-2.fc39.aarch64 libvorbis-1:1.3.7-8.fc39.aarch64 libvpx-1.13.1-1.fc39.aarch64 libwacom-2.10.0-1.fc39.aarch64 libwacom-data-2.10.0-1.fc39.noarch libwayland-client-1.22.0-2.fc39.aarch64 libwayland-cursor-1.22.0-2.fc39.aarch64 libwayland-egl-1.22.0-2.fc39.aarch64 libwayland-server-1.22.0-2.fc39.aarch64 libwbclient-2:4.19.5-1.fc39.aarch64 libwebp-1.3.2-2.fc39.aarch64 libxcb-1.13.1-12.fc39.aarch64 libxcb-devel-1.13.1-12.fc39.aarch64 libxcrypt-devel-4.4.36-2.fc39.aarch64 libxkbcommon-1.6.0-1.fc39.aarch64 libxkbcommon-x11-1.6.0-1.fc39.aarch64 libxshmfence-1.3-13.fc39.aarch64 libyaml-0.2.5-12.fc39.aarch64 libzstd-devel-1.5.6-1.fc39.aarch64 lksctp-tools-1.0.19-4.fc39.aarch64 llvm-libs-17.0.6-3.fc39.aarch64 llvm16-libs-16.0.6-5.fc39.aarch64 lmdb-0.9.32-1.fc39.aarch64 lmdb-devel-0.9.32-1.fc39.aarch64 lmdb-libs-0.9.32-1.fc39.aarch64 lpcnetfreedv-0.5-3.fc39.aarch64 lua-5.4.6-3.fc39.aarch64 lua-filesystem-1.8.0-9.fc39.aarch64 lua-json-1.3.4-4.fc39.noarch lua-lpeg-1.0.2-11.fc39.aarch64 lua-posix-36.2.1-3.fc39.aarch64 lua-term-0.07-18.fc39.aarch64 magma-2.8.0-20240328.0.cu12_3.fc39.aarch64 magma-devel-2.8.0-20240328.0.cu12_3.fc39.aarch64 make-1:4.4.1-2.fc39.aarch64 mariadb-connector-c-3.3.8-1.fc39.aarch64 mariadb-connector-c-config-3.3.8-1.fc39.noarch mbedtls-2.28.7-1.fc39.aarch64 mesa-filesystem-23.3.6-1.fc39.aarch64 mesa-libEGL-23.3.6-1.fc39.aarch64 mesa-libGL-23.3.6-1.fc39.aarch64 mesa-libGLU-9.0.3-1.fc39.aarch64 mesa-libGLU-devel-9.0.3-1.fc39.aarch64 mesa-libgbm-23.3.6-1.fc39.aarch64 mesa-libglapi-23.3.6-1.fc39.aarch64 metis-5.2.1-20230403.0.gite0f1b88b.fc39.aarch64 miniz-3.0.2-3.fc39.aarch64 miniz-devel-3.0.2-3.fc39.aarch64 minizip-ng-3.0.7-5.fc39.aarch64 mockito-3.12.4-7.fc39.noarch mp-3.1.0-42.20200303git7fd4828.fc39.aarch64 mpdecimal-2.5.1-7.fc39.aarch64 mpfr-devel-4.2.0-3.fc39.aarch64 mpg123-libs-1.31.3-2.fc39.aarch64 mtdev-1.1.6-6.fc39.aarch64 ncurses-6.4-7.20230520.fc39.1.aarch64 neon2sse-devel-0-20230131.0.git097a5eca.fc38.noarch netcdf-4.9.0-5.fc38.aarch64 netpbm-11.02.00-2.fc39.aarch64 nettle-3.9.1-2.fc39.aarch64 nnpack-0-20230201.0.git70a77f48.fc38.aarch64 nnpack-devel-0-20230201.0.git70a77f48.fc38.aarch64 nspr-4.35.0-18.fc39.aarch64 nss-3.98.0-1.fc39.aarch64 nss-softokn-3.98.0-1.fc39.aarch64 nss-softokn-freebl-3.98.0-1.fc39.aarch64 nss-sysinit-3.98.0-1.fc39.aarch64 nss-util-3.98.0-1.fc39.aarch64 numactl-devel-2.0.16-3.fc39.aarch64 numactl-libs-2.0.16-3.fc39.aarch64 objectweb-asm-9.5-2.fc39.noarch objenesis-3.3-3.fc39.noarch ocl-icd-2.3.2-2.fc39.aarch64 ocl-icd-devel-2.3.2-2.fc39.aarch64 ogdi-4.1.0-11.fc39.aarch64 onnx-devel-1.17.0-20240404.0.git4128a090.fc39.aarch64 onnx-libs-1.17.0-20240404.0.git4128a090.fc39.aarch64 onnx-optimizer-0.3.19-20240303.0.gitb3a46118.fc39.aarch64 onnx-optimizer-devel-0.3.19-20240303.0.gitb3a46118.fc39.aarch64 openblas-0.3.21-6.fc39.aarch64 openblas-devel-0.3.21-6.fc39.aarch64 openblas-openmp-0.3.21-6.fc39.aarch64 openblas-openmp64-0.3.21-6.fc39.aarch64 openblas-openmp64_-0.3.21-6.fc39.aarch64 openblas-serial-0.3.21-6.fc39.aarch64 openblas-serial64-0.3.21-6.fc39.aarch64 openblas-serial64_-0.3.21-6.fc39.aarch64 openblas-threads-0.3.21-6.fc39.aarch64 openblas-threads64-0.3.21-6.fc39.aarch64 openblas-threads64_-0.3.21-6.fc39.aarch64 opencl-headers-3.0-19.20231212git2368105.fc39.noarch opencore-amr-0.1.6-4.fc39.aarch64 opencv-4.9.0-20231227.1.cu12_3.fc39.aarch64 opencv-contrib-4.9.0-20231227.1.cu12_3.fc39.aarch64 opencv-core-4.9.0-20231227.1.cu12_3.fc39.aarch64 opencv-cuda-4.9.0-20231227.1.cu12_3.fc39.aarch64 opencv-devel-4.9.0-20231227.1.cu12_3.fc39.aarch64 opencv-static-4.9.0-20231227.1.cu12_3.fc39.aarch64 openexr-libs-3.1.10-2.fc39.aarch64 openjpeg2-2.5.2-1.fc39.aarch64 openpgm-5.2.122-32.fc39.aarch64 openpgm-devel-5.2.122-32.fc39.aarch64 openslide-3.4.1-24.fc39.aarch64 openssh-9.3p1-10.fc39.aarch64 openssh-clients-9.3p1-10.fc39.aarch64 opentest4j-1.2.0-14.fc39.noarch opus-1.3.1-13.fc39.aarch64 orc-0.4.33-3.fc39.aarch64 pango-1.51.0-1.fc39.aarch64 pcre-8.45-1.fc39.4.aarch64 pcre2-devel-10.42-1.fc39.2.aarch64 pcre2-utf16-10.42-1.fc39.2.aarch64 pcre2-utf32-10.42-1.fc39.2.aarch64 peachpy-python3-0-20221113.1.git349e8f83.fc39.noarch perl-AutoLoader-5.74-502.fc39.noarch perl-B-1.88-502.fc39.aarch64 perl-Carp-1.54-500.fc39.noarch perl-Class-Struct-0.68-502.fc39.noarch perl-Data-Dumper-2.188-501.fc39.aarch64 perl-Digest-1.20-500.fc39.noarch perl-Digest-MD5-2.58-500.fc39.aarch64 perl-DynaLoader-1.54-502.fc39.aarch64 perl-Encode-4:3.19-500.fc39.aarch64 perl-Errno-1.37-502.fc39.aarch64 perl-Error-1:0.17029-13.fc39.noarch perl-Exporter-5.77-500.fc39.noarch perl-Fcntl-1.15-502.fc39.aarch64 perl-File-Basename-2.86-502.fc39.noarch perl-File-Find-1.43-502.fc39.noarch perl-File-Path-2.18-500.fc39.noarch perl-File-Temp-1:0.231.100-500.fc39.noarch perl-File-stat-1.13-502.fc39.noarch perl-FileHandle-2.05-502.fc39.noarch perl-Getopt-Long-1:2.54-500.fc39.noarch perl-Getopt-Std-1.13-502.fc39.noarch perl-Git-2.44.0-1.fc39.noarch perl-HTTP-Tiny-0.088-3.fc39.noarch perl-IO-1.52-502.fc39.aarch64 perl-IO-Socket-IP-0.42-1.fc39.noarch perl-IO-Socket-SSL-2.083-3.fc39.noarch perl-IPC-Open3-1.22-502.fc39.noarch perl-MIME-Base64-3.16-500.fc39.aarch64 perl-Mozilla-CA-20230801-1.fc39.noarch perl-Net-SSLeay-1.92-10.fc39.aarch64 perl-POSIX-2.13-502.fc39.aarch64 perl-PathTools-3.89-500.fc39.aarch64 perl-Pod-Escapes-1:1.07-500.fc39.noarch perl-Pod-Perldoc-3.28.01-501.fc39.noarch perl-Pod-Simple-1:3.45-4.fc39.noarch perl-Pod-Usage-4:2.03-500.fc39.noarch perl-Scalar-List-Utils-5:1.63-500.fc39.aarch64 perl-SelectSaver-1.02-502.fc39.noarch perl-Socket-4:2.037-3.fc39.aarch64 perl-Storable-1:3.32-500.fc39.aarch64 perl-Symbol-1.09-502.fc39.noarch perl-Term-ANSIColor-5.01-501.fc39.noarch perl-Term-Cap-1.18-500.fc39.noarch perl-TermReadKey-2.38-18.fc39.aarch64 perl-Text-ParseWords-3.31-500.fc39.noarch perl-Text-Tabs+Wrap-2023.0511-3.fc39.noarch perl-Time-Local-2:1.350-3.fc39.noarch perl-URI-5.21-1.fc39.noarch perl-base-2.27-502.fc39.noarch perl-constant-1.33-501.fc39.noarch perl-if-0.61.000-502.fc39.noarch perl-interpreter-4:5.38.2-502.fc39.aarch64 perl-lib-0.65-502.fc39.aarch64 perl-libnet-3.15-501.fc39.noarch perl-libs-4:5.38.2-502.fc39.aarch64 perl-locale-1.10-502.fc39.noarch perl-mro-1.28-502.fc39.aarch64 perl-overload-1.37-502.fc39.noarch perl-overloading-0.02-502.fc39.noarch perl-parent-1:0.241-500.fc39.noarch perl-podlators-1:5.01-500.fc39.noarch perl-vars-1.05-502.fc39.noarch pixman-0.42.2-2.fc39.aarch64 poppler-23.08.0-1.fc39.aarch64 poppler-data-0.4.11-5.fc39.noarch poppler-glib-23.08.0-1.fc39.aarch64 procps-ng-4.0.3-5.fc39.aarch64 proj-9.2.1-2.fc39.aarch64 proj-data-9.2.1-2.fc39.noarch protobuf-3.19.6-6.fc39.aarch64 protobuf-compat-3.21.9-2.fc39.aarch64 protobuf-compat-compiler-3.21.9-2.fc39.aarch64 protobuf-compat-devel-3.21.9-2.fc39.aarch64 psimd-devel-1:0-20200517.2.git072586a7.fc39.noarch pthreadpool-1:0.1-20240121.0.git178e3e06.fc39.aarch64 pthreadpool-devel-1:0.1-20240121.0.git178e3e06.fc39.aarch64 pugixml-1.13-3.fc39.aarch64 pybind11-devel-2.11.1-1.fc39.aarch64 pyproject-rpm-macros-1.12.0-1.fc39.noarch python-pip-wheel-23.2.1-1.fc39.noarch python-rpm-macros-3.12-4.fc39.noarch python3-3.12.2-2.fc39.aarch64 python3-devel-3.12.2-2.fc39.aarch64 python3-libs-3.12.2-2.fc39.aarch64 python3-numpy-1:1.24.4-2.fc39.aarch64 python3-packaging-23.1-4.fc39.noarch python3-pybind11-2.11.1-1.fc39.aarch64 python3-pyyaml-6.0.1-11.fc39.aarch64 python3-rpm-generators-14-7.fc39.noarch python3-rpm-macros-3.12-4.fc39.noarch python3-setuptools-67.7.2-7.fc39.noarch python3-six-1.16.0-12.fc39.noarch python3-typing-extensions-4.8.0-1.fc39.noarch qnnpack-0-20190828.2.git7d2a4e99.fc38.aarch64 qnnpack-devel-0-20190828.2.git7d2a4e99.fc38.aarch64 qt-settings-39.1-1.fc39.noarch qt5-qtbase-5.15.12-5.fc39.aarch64 qt5-qtbase-common-5.15.12-5.fc39.noarch qt5-qtbase-gui-5.15.12-5.fc39.aarch64 rav1e-libs-0.7.1-1.fc39.aarch64 rdma-core-devel-46.0-4.fc39.aarch64 re2-1:20220601-3.fc39.aarch64 rhash-1.4.3-3.fc39.aarch64 rocksdb-8.1.1-2.fc39.aarch64 rocksdb-devel-8.1.1-2.fc39.aarch64 rsvg-pixbuf-loader-2.57.1-1.fc39.aarch64 samba-client-libs-2:4.19.5-1.fc39.aarch64 samba-common-2:4.19.5-1.fc39.noarch samba-common-libs-2:4.19.5-1.fc39.aarch64 scotch-7.0.3-3.fc39.aarch64 scotch-devel-7.0.3-3.fc39.aarch64 shared-mime-info-2.2-4.fc39.aarch64 sleef-3.6-20240320.0.git60e76d2b.fc39.aarch64 sleef-devel-3.6-20240320.0.git60e76d2b.fc39.aarch64 snappy-1.1.10-2.fc39.aarch64 snappy-devel-1.1.10-2.fc39.aarch64 soxr-0.1.3-14.fc39.aarch64 speex-1.2.0-15.fc39.aarch64 srt-libs-1.5.3-1.fc39.aarch64 suitesparse-5.13.0-3.fc39.aarch64 svt-av1-libs-1.4.1-3.fc39.aarch64 systemd-254.10-1.fc39.aarch64 systemd-pam-254.10-1.fc39.aarch64 systemd-rpm-macros-254.10-1.fc39.noarch tbb-2020.3-20.fc39.aarch64 tbb-devel-2020.3-20.fc39.aarch64 tcl-1:8.6.12-5.fc39.aarch64 tensorpipe-0-20220513.1.gitbb1473a4.fc37.aarch64 tensorpipe-devel-0-20220513.1.gitbb1473a4.fc37.aarch64 twolame-libs-0.4.0-3.fc39.aarch64 tzdata-2024a-2.fc39.noarch tzdata-java-2024a-2.fc39.noarch unixODBC-2.3.11-4.fc39.aarch64 uriparser-0.9.7-3.fc39.aarch64 urw-base35-bookman-fonts-20200910-18.fc39.noarch urw-base35-c059-fonts-20200910-18.fc39.noarch urw-base35-d050000l-fonts-20200910-18.fc39.noarch urw-base35-fonts-20200910-18.fc39.noarch urw-base35-fonts-common-20200910-18.fc39.noarch urw-base35-gothic-fonts-20200910-18.fc39.noarch urw-base35-nimbus-mono-ps-fonts-20200910-18.fc39.noarch urw-base35-nimbus-roman-fonts-20200910-18.fc39.noarch urw-base35-nimbus-sans-fonts-20200910-18.fc39.noarch urw-base35-p052-fonts-20200910-18.fc39.noarch urw-base35-standard-symbols-ps-fonts-20200910-18.fc39.noarch urw-base35-z003-fonts-20200910-18.fc39.noarch utf8proc-2.7.0-5.fc39.aarch64 vapoursynth-libs-63-2.fc39.aarch64 vim-filesystem-2:9.1.264-1.fc39.noarch vo-amrwbenc-0.1.3-19.fc39.aarch64 vtk-9.2.6-7.fc39.aarch64 xapian-core-libs-1.4.23-1.fc39.aarch64 xcb-util-0.4.1-3.fc39.aarch64 xcb-util-image-0.4.1-3.fc39.aarch64 xcb-util-keysyms-0.4.1-3.fc39.aarch64 xcb-util-renderutil-0.3.10-3.fc39.aarch64 xcb-util-wm-0.4.2-3.fc39.aarch64 xerces-c-3.2.5-1.fc39.aarch64 xkeyboard-config-2.40-1.fc39.noarch xml-common-0.6.3-61.fc39.noarch xorg-x11-proto-devel-2023.2-2.fc39.noarch xvidcore-1.3.7-10.fc39.aarch64 zeromq-4.3.4-8.fc39.aarch64 zeromq-devel-4.3.4-8.fc39.aarch64 zimg-3.0.5-1.fc39.aarch64 zlib-devel-1.2.13-4.fc39.aarch64 zvbi-0.2.35-21.fc39.aarch64 Complete! Finish: build setup for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm Start: rpmbuild pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: aarch64 Building for target aarch64 setting SOURCE_DATE_EPOCH=1554595200 Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.Zq8iiF + umask 022 + cd /builddir/build/BUILD + cd /builddir/build/BUILD + rm -rf pytorch + /usr/bin/mkdir -p pytorch + cd pytorch + rm -rf /builddir/build/BUILD/pytorch-SPECPARTS + /usr/bin/mkdir -p /builddir/build/BUILD/pytorch-SPECPARTS + /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w . + git clone --depth 1 -n -b main https://github.com/pytorch/pytorch.git . Cloning into '.'... + git fetch --depth 1 origin 7efaf54dc46034189cb36b345764a5a9a5b693d4 From https://github.com/pytorch/pytorch * branch 7efaf54dc46034189cb36b345764a5a9a5b693d4 -> FETCH_HEAD + git reset --hard 7efaf54dc46034189cb36b345764a5a9a5b693d4 Updating files: 100% (18485/18485), done. HEAD is now at 7efaf54 Fakeifying views shouldnt create symbols when dynamic=False (#123348) + git submodule update --init --depth 1 third_party/fmt Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' Cloning into '/builddir/build/BUILD/pytorch/third_party/fmt'... From https://github.com/fmtlib/fmt * branch e69e5f977d458f2650bb346dadf2ad30c5320281 -> FETCH_HEAD Submodule path 'third_party/fmt': checked out 'e69e5f977d458f2650bb346dadf2ad30c5320281' + git submodule update --init --depth 1 third_party/XNNPACK Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' Cloning into '/builddir/build/BUILD/pytorch/third_party/XNNPACK'... From https://github.com/google/XNNPACK * branch fcbf55af6cf28a4627bcd1f703ab7ad843f0f3a2 -> FETCH_HEAD Submodule path 'third_party/XNNPACK': checked out 'fcbf55af6cf28a4627bcd1f703ab7ad843f0f3a2' + git submodule update --init --depth 1 third_party/ittapi Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' Cloning into '/builddir/build/BUILD/pytorch/third_party/ittapi'... From https://github.com/intel/ittapi * branch 5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42 -> FETCH_HEAD Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42' + git submodule update --init --depth 1 third_party/pocketfft Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' Cloning into '/builddir/build/BUILD/pytorch/third_party/pocketfft'... From https://github.com/mreineck/pocketfft * branch 9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe -> FETCH_HEAD Submodule path 'third_party/pocketfft': checked out '9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe' + git submodule update --init --depth 1 third_party/cudnn_frontend Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' Cloning into '/builddir/build/BUILD/pytorch/third_party/cudnn_frontend'... From https://github.com/NVIDIA/cudnn-frontend * branch 150798fe976556078f443fdb059a1ff0361f58a2 -> FETCH_HEAD Submodule path 'third_party/cudnn_frontend': checked out '150798fe976556078f443fdb059a1ff0361f58a2' + git --no-pager log --format=fuller commit 7efaf54dc46034189cb36b345764a5a9a5b693d4 Author: Brian Hirsh AuthorDate: Thu Apr 11 08:19:28 2024 -0700 Commit: PyTorch MergeBot CommitDate: Fri Apr 12 01:12:23 2024 +0000 Fakeifying views shouldnt create symbols when dynamic=False (#123348) Fixes https://github.com/pytorch/pytorch/issues/123298 I was also seeing some crashes in torchtrain due to dynamic shapes, even when I set `compile(dynamic=False)` (cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @wanchaol). This doesn't fix the underlying dynamic shape issues with compile + DTensor, but it does prevent dynamic shapes from leaking in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123348 Approved by: https://github.com/ezyang ghstack dependencies: #122502, #122751 Patch #1 (pytorch-C.patch): + echo 'Patch #1 (pytorch-C.patch):' + /usr/bin/patch --no-backup-if-mismatch -f -p0 -b --suffix .python~ --fuzz=100 patching file torch/CMakeLists.txt Hunk #1 succeeded at 277 (offset -2 lines). + echo 'Patch #5 (pytorch-cuda12.patch):' Patch #5 (pytorch-cuda12.patch): + /usr/bin/patch --no-backup-if-mismatch -f -p1 -b --suffix .cu12~ --fuzz=100 patching file aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu patching file aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu patching file aten/src/ATen/native/transformers/cuda/attention.cu Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/attention_backward.cu Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_backward.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_forward.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/flash_attn/flash_bwd_launch_template.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/flash_attn/flash_fwd_launch_template.h Hunk #1 succeeded at 1 with fuzz 3. + sed -i -e 's|VERSION_LESS 3.7)|VERSION_LESS 3.6)|g' cmake/Dependencies.cmake + sed -i -e 's|PY_MAJOR_VERSION == 3|PY_MAJOR_VERSION == 3 \&\& PY_MINOR_VERSION > 6|' torch/csrc/dynamo/eval_frame.c + sed -i 's|CMAKE_CXX_STANDARD 14|CMAKE_CXX_STANDARD 17|' CMakeLists.txt + sed -i -e 's|torch_cpu PUBLIC c10|torch_cpu PUBLIC c10 qnnpack gloo gloo_cuda|' caffe2/CMakeLists.txt + sed -i -e 's|USE_SYSTEM_BIND11|USE_SYSTEM_PYBIND11|g' cmake/Dependencies.cmake + rm -rf 'third_party/pthreadpool/*' + touch third_party/pthreadpool/CMakeLists.txt + sed -i -e 's|NAMES openblas|NAMES openblaso openblas|' cmake/Modules/FindOpenBLAS.cmake + sed -i -e 's|USE_ZSTD|NOT_USE_ZSTD|g' cmake/Dependencies.cmake + sed -i -e 's|add_subdirectory(zstd)|list(APPEND Caffe2_PUBLIC_DEPENDENCY_LIBS zstd)|g' caffe2/share/contrib/CMakeLists.txt + sed -i -e 's|Caffe2_DEPENDENCY_LIBS onnx_proto onnx|Caffe2_DEPENDENCY_LIBS onnx_proto onnx onnx_optimizer|' cmake/Dependencies.cmake + mkdir -p third_party/tensorpipe + echo '' + sed -i '/add_dependencies(tensorpipe_agent tensorpipe)/d' caffe2/CMakeLists.txt + echo '' + echo 'set(NNPACK_FOUND TRUE)' + sed -i '/TARGET cpuinfo PROPERTY/d' cmake/Dependencies.cmake + sed -i '/APPEND Caffe2_DEPENDENCY_LIBS fp16/d' cmake/Dependencies.cmake + mkdir -p third_party/QNNPACK + echo '' + sed -i '/TARGET qnnpack PROPERTY/d' cmake/Dependencies.cmake + sed -i -e '/target_compile_options(qnnpack/d' cmake/Dependencies.cmake + mkdir -p third_party/psimd + echo '' + sed -i '/pytorch_qnnpack PRIVATE psimd/d' aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt + sed -i '/NOT TARGET fxdiv/,/endif/d' caffe2/CMakeLists.txt + sed -i '/torch_cpu PRIVATE fxdiv/d' caffe2/CMakeLists.txt + sed -i '/pytorch_qnnpack PRIVATE fxdiv/d' aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt + mkdir -p third_party/fbgemm + echo '' + sed -i '/(TARGET fbgemm/d' cmake/Dependencies.cmake + sed -i 's|caffe2_fakelowp_ops fbgemm cpuinfo|caffe2_fakelowp_ops|' caffe2/contrib/fakelowp/CMakeLists.txt + sed -i 's|caffe2_dnnlowp_avx2_ops fbgemm|caffe2_dnnlowp_avx2_ops|' caffe2/quantization/server/CMakeLists.txt + mkdir -p third_party/foxi + echo '' + sed -i '/if(NOT TARGET kineto)/,/endif()/d' cmake/Dependencies.cmake + sed -i 's|libkineto/include|libkineto/include\n/usr/include/kineto|' torch/CMakeLists.txt + sed -i 's|libkineto/include|libkineto/include\n/usr/include/kineto|' caffe2/CMakeLists.txt + mkdir -p third_party/onnx-tensorrt + echo '' + sed -i /nvonnxparser_static/d cmake/Dependencies.cmake + sed -i 's|onnx_trt_library|nvonnxparser_static|g' cmake/Dependencies.cmake + rm -rf torch/csrc/jit/serialization/mobile_bytecode_generated.h + flatc --cpp --gen-mutable --scoped-enums -o torch/csrc/jit/serialization -c torch/csrc/jit/serialization/mobile_bytecode.fbs + echo '// @generated' + sed -i '/find_package(RocksDB CONFIG)/d' modules/rocksdb/CMakeLists.txt + sed -i 's|RocksDB::rocksdb|RocksDB::rocksdb-shared|' modules/rocksdb/CMakeLists.txt + mv -f cmake/Modules_CUDA_fix/FindCUDNN.cmake cmake/Modules + rm -rf cmake/Modules_CUDA_fix + find . -type d -name FindCUDA -exec rm -rf '{}' ';' + sed -i -e '/install/{:a;/COMPONENT/bb;N;ba;:b;/Modules_CUDA_fix/d;}' CMakeLists.txt + sed -i -e 's|CMAKE_CUDA_FLAGS "-D|CMAKE_CUDA_FLAGS " -D|' CMakeLists.txt + sed -i '/install(EXPORT Caffe2Targets/,/dev)/d' CMakeLists.txt + sed -i 's|SYSTEM ||g' c10/CMakeLists.txt + sed -i 's|SYSTEM ||g' torch/CMakeLists.txt + sed -i 's|SYSTEM ||g' caffe2/CMakeLists.txt + sed -i 's|BEFORE SYSTEM ||g' cmake/ProtoBuf.cmake + sed -i 's|AFTER SYSTEM ||g' cmake/Dependencies.cmake + sed -i 's|BEFORE SYSTEM ||g' cmake/Dependencies.cmake + sed -i 's|SYSTEM ||g' cmake/Dependencies.cmake + sed -i '1i #include ' c10/util/Registry.h + sed -i '1i #include ' c10/core/DispatchKey.h + sed -i '1i #include ' torch/csrc/jit/runtime/logging.cpp + sed -i '1i #include ' torch/csrc/lazy/core/multi_wait.cpp + sed -i '1i #include "stdint.h"' torch/csrc/jit/passes/quantization/quantization_type.h + RPM_EC=0 ++ jobs -p + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.eEm1Ml + umask 022 + cd /builddir/build/BUILD + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -Wl,-lstdc++' + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + cd pytorch + mkdir build + pushd build ~/build/BUILD/pytorch/build ~/build/BUILD/pytorch + export ONNX_ML=0 + ONNX_ML=0 + export BUILD_SPLIT_CUDA=ON + BUILD_SPLIT_CUDA=ON + export REL_WITH_DEB_INFO=1 + REL_WITH_DEB_INFO=1 + export TORCH_NVCC_FLAGS=-DCUDA_HAS_FP16 + TORCH_NVCC_FLAGS=-DCUDA_HAS_FP16 + export PYTHON_EXECUTABLE=/usr/bin/python3 + PYTHON_EXECUTABLE=/usr/bin/python3 + export LDFLAGS=-Wl,-lstdc++ + LDFLAGS=-Wl,-lstdc++ + export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64/ + LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64/ + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS=-Wl,-lstdc++ + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + /usr/bin/cmake -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON .. -Wno-dev -DCMAKE_SKIP_RPATH=ON -DCMAKE_VERBOSE_MAKEFILE=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_NO_SYSTEM_FROM_IMPORTED=ON -DCMAKE_SKIP_RULE_DEPENDENCY=ON -DCMAKE_SUPPRESS_REGENERATION=ON -DUSE_CCACHE=OFF -DHAVE_SOVERSION=ON -DUSE_NATIVE_ARCH=OFF -DUSE_DISTRIBUTED=ON -DBUILD_DOCS=OFF -DBUILD_PYTHON=ON -DBUILD_FUNCTORCH=ON -DBUILD_CAFFE2=OFF -DBUILD_BINARY=OFF -DBUILD_BENCHMARK=OFF -DBUILD_CUSTOM_PROTOBUF=OFF -DBUILDING_WITH_TORCH_LIBS=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYBIND11_PYTHON_VERSION=3.12 -DCAFFE2_LINK_LOCAL_PROTOBUF=OFF -DONNX_ML=OFF -DUSE_GLOG=ON -DUSE_GFLAGS=ON -DUSE_OPENMP=ON -DUSE_KINETO=ON -DUSE_BREAKPAD=OFF -DUSE_SYSTEM_ONNX=ON -DUSE_SYSTEM_GLOO=ON -DUSE_SYSTEM_PYBIND11=ON -DUSE_SYSTEM_EIGEN_INSTALL=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_NVRTC=ON -DUSE_CUPTI_SO=ON -DUSE_FAST_NVCC=ON -DUSE_SYSTEM_NCCL=ON -DCMAKE_CUDA_FLAGS=-fPIC -DCUDA_PROPAGATE_HOST_FLAGS=OFF '-DTORCH_CUDA_ARCH_LIST=5.2+PTX 6.1 7.5 8.6 8.9 9.0' -DCUDA_HOST_COMPILER=/usr/bin/cuda-g++ -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/cuda-g++ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.3 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.3/bin/nvcc '-DCUDA_NVCC_FLAGS=--compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all' '-DCMAKE_CUDA_FLAGS=--compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all' -DNCCL_INCLUDE_DIR=/usr/include/nccl -DUSE_MAGMA=ON -DBUILD_SPLIT_CUDA=ON -DUSE_TENSORRT=OFF -DBLAS=OpenBLAS -DUSE_MPI=OFF -DUSE_OBSERVERS=OFF -DUSE_ASAN=OFF -DUSE_ROCM=OFF -DUSE_MKLDNN=OFF -DUSE_FBGEMM=OFF -DUSE_NNPACK=ON -DUSE_QNNPACK=ON -DUSE_PYTORCH_QNNPACK=ON -DUSE_SYSTEM_FP16=ON -DUSE_SYSTEM_PSIMD=ON -DUSE_SYSTEM_SLEEF=ON -DUSE_SYSTEM_FXDIV=ON -DUSE_SYSTEM_XNNPACK=OFF -DUSE_SYSTEM_CPUINFO=ON -DUSE_SYSTEM_PTHREADPOOL=ON -DUSE_TENSORPIPE=ON -DUSE_FAKELOWP=OFF -DUSE_OPENCL=OFF -DUSE_GLOO=ON -DUSE_ZMQ=ON -DUSE_ZSTD=ON -DUSE_LMDB=ON -DUSE_REDIS=ON -DUSE_LEVELDB=ON -DUSE_ROCKSDB=ON -DUSE_FFMPEG=OFF -DUSE_OPENCV=ON -DUSE_METAL=OFF -DUSE_TBB=OFF -DUSE_LLVM=OFF -DATEN_NO_TEST=ON -- The CXX compiler identification is GNU 13.2.1 -- The C compiler identification is GNU 13.2.1 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- /usr/bin/g++ /builddir/build/BUILD/pytorch/torch/abi-check.cpp -o /builddir/build/BUILD/pytorch/build/abi-check -- Determined _GLIBCXX_USE_CXX11_ABI=1 -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed -- Turning off deprecation warning due to glog. -- Performing Test C_HAS_AVX_1 -- Performing Test C_HAS_AVX_1 - Failed -- Performing Test C_HAS_AVX_2 -- Performing Test C_HAS_AVX_2 - Failed -- Performing Test C_HAS_AVX_3 -- Performing Test C_HAS_AVX_3 - Failed -- Performing Test C_HAS_AVX2_1 -- Performing Test C_HAS_AVX2_1 - Failed -- Performing Test C_HAS_AVX2_2 -- Performing Test C_HAS_AVX2_2 - Failed -- Performing Test C_HAS_AVX2_3 -- Performing Test C_HAS_AVX2_3 - Failed -- Performing Test C_HAS_AVX512_1 -- Performing Test C_HAS_AVX512_1 - Failed -- Performing Test C_HAS_AVX512_2 -- Performing Test C_HAS_AVX512_2 - Failed -- Performing Test C_HAS_AVX512_3 -- Performing Test C_HAS_AVX512_3 - Failed -- Performing Test CXX_HAS_AVX_1 -- Performing Test CXX_HAS_AVX_1 - Failed -- Performing Test CXX_HAS_AVX_2 -- Performing Test CXX_HAS_AVX_2 - Failed -- Performing Test CXX_HAS_AVX_3 -- Performing Test CXX_HAS_AVX_3 - Failed -- Performing Test CXX_HAS_AVX2_1 -- Performing Test CXX_HAS_AVX2_1 - Failed -- Performing Test CXX_HAS_AVX2_2 -- Performing Test CXX_HAS_AVX2_2 - Failed -- Performing Test CXX_HAS_AVX2_3 -- Performing Test CXX_HAS_AVX2_3 - Failed -- Performing Test CXX_HAS_AVX512_1 -- Performing Test CXX_HAS_AVX512_1 - Failed -- Performing Test CXX_HAS_AVX512_2 -- Performing Test CXX_HAS_AVX512_2 - Failed -- Performing Test CXX_HAS_AVX512_3 -- Performing Test CXX_HAS_AVX512_3 - Failed -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Failed -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Success -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Success -- Performing Test COMPILER_SUPPORTS_RDYNAMIC -- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success -- Found CUDA: /usr/local/cuda-12.3 (found version "12.3") -- The CUDA compiler identification is NVIDIA 12.3.107 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda-12.3/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda-12.3/include (found version "12.3.107") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Caffe2: CUDA detected: 12.3 -- Caffe2: CUDA nvcc is: /usr/local/cuda-12.3/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda-12.3 -- Caffe2: Header version is: 12.3 -- /usr/local/cuda-12.3/lib64/libnvrtc.so shorthash is 543806da -- Found CUDNN: /usr/lib64/libcudnn.so -- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH) CMake Warning at cmake/public/cuda.cmake:275 (message): Cannot find cuSPARSELt library. Turning the option off Call Stack (most recent call first): cmake/Dependencies.cmake:44 (include) CMakeLists.txt:760 (include) -- Added CUDA NVCC flags for: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_52,code=compute_52 -- Caffe2: Found protobuf with new-style protobuf targets. -- Caffe2 protobuf include directory: /usr/include -- Trying to find preferred BLAS backend of choice: OpenBLAS -- Found OpenBLAS libraries: /usr/lib64/libopenblaso.so -- Found OpenBLAS include: /usr/include/openblas -- Using pocketfft in directory: /builddir/build/BUILD/pytorch/third_party/pocketfft/ -- Found pthreadpool: /usr/lib64/libpthreadpool.so Found cpuinfo: /usr/lib64/libcpuinfo.so -- The ASM compiler identification is GNU -- Found assembler: /usr/bin/gcc -- Caffe2: Found gflags with new-style gflags target. -- Caffe2: Cannot find glog automatically. Using legacy find. -- Found glog: /usr/include -- Caffe2: Found glog (include: /usr/include, library: /usr/lib64/libglog.so) CMake Warning at cmake/Dependencies.cmake:848 (message): Turning USE_FAKELOWP off as it depends on USE_FBGEMM. Call Stack (most recent call first): CMakeLists.txt:760 (include) -- Found LMDB: /usr/include -- Found lmdb (include: /usr/include, library: /usr/lib64/liblmdb.so) -- Found LevelDB: /usr/include -- Found LevelDB (include: /usr/include, library: /usr/lib64/libleveldb.so) -- Found Snappy: /usr/include -- Found Snappy (include: /usr/include, library: /usr/lib64/libsnappy.so) -- Found Numa: /usr/include -- Found Numa (include: /usr/include, library: /usr/lib64/libnuma.so) -- Found ZMQ: /usr/include -- Found ZMQ (include: /usr/include, library: /usr/lib64/libzmq.so) -- Found Hiredis: /usr/include -- Found Hiredis (include: /usr/include, library: /usr/lib64/libhiredis.so) -- OpenCV found (/usr/lib64/cmake/opencv4) -- Found system Eigen at /usr/include/eigen3 -- Setting Python's include dir to /usr/include/python3.12 from sysconfig -- Setting Python's library to /usr/lib64/python3.12 -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.12.2", minimum required is "3.0") -- Found PythonLibs: /usr/lib64/python3.12 (found suitable version "3.12.2", minimum required is "3.0") -- Found NumPy: /usr/lib64/python3.12/site-packages/numpy/core/include (found version "1.24.4") -- NumPy ver. 1.24.4 found (include: /usr/lib64/python3.12/site-packages/numpy/core/include) -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.12.2", minimum required is "3.12") -- Found PythonLibs: /usr/lib64/python3.12 -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Found pybind11: /usr/include (found version "2.11.1") -- pybind11 include dirs: /usr/include;/usr/include/python3.12 -- Check OMP with lib /usr/lib/gcc/aarch64-redhat-linux/13/libgomp.so and flags -fopenmp -v -- Check OMP with lib /usr/lib/gcc/aarch64-redhat-linux/13/libgomp.so and flags -fopenmp -v -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Adding OpenMP CXX_FLAGS: -fopenmp -- Will link against OpenMP libraries: /usr/lib/gcc/aarch64-redhat-linux/13/libgomp.so -- Found NCCL: /usr/include -- Determining NCCL version from /usr/include/nccl.h... -- Looking for NCCL_VERSION_CODE -- Looking for NCCL_VERSION_CODE - not found -- NCCL version < 2.3.5-5 -- Found NCCL (include: /usr/include, library: /usr/lib64/libnccl.so) -- Found CUB: /usr/local/cuda-12.3/include -- Converting CMAKE_CUDA_FLAGS to CUDA_NVCC_FLAGS: CUDA_NVCC_FLAGS = --compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all;-DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS;-D_GLIBCXX_USE_CXX11_ABI=1;-Xfatbin;-compress-all;--compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all;-DONNX_NAMESPACE=onnx;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_52,code=compute_52;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl;--expt-relaxed-constexpr;--expt-extended-lambda CUDA_NVCC_FLAGS_DEBUG = -g CUDA_NVCC_FLAGS_RELEASE = -O3;-DNDEBUG CUDA_NVCC_FLAGS_RELWITHDEBINFO = -O2;-g;-DNDEBUG CUDA_NVCC_FLAGS_MINSIZEREL = -O1;-DNDEBUG Found gloo: /usr/lib64/libgloo.so -- Found onnx: /usr/lib64/libonnx.so /usr/lib64/libonnx_proto.so -- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor -- Adding -DNDEBUG to compile flags -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - False -- Compiling with MAGMA support -- MAGMA INCLUDE DIRECTORIES: /usr/include -- MAGMA LIBRARIES: /usr/lib64/libmagma.so -- MAGMA V2 check: 0 -- Could not find hardware support for NEON on this machine. -- No OMAP3 processor on this machine. -- No OMAP4 processor on this machine. -- asimd/Neon found with compiler flag : -D__NEON__ -- Looking for cheev_ -- Looking for cheev_ - found -- Looking for cgesdd_ -- Looking for cgesdd_ - found -- Found a library with LAPACK API (open). -- MIOpen not found. Compiling without MIOpen support disabling ROCM because NOT USE_ROCM is set disabling MKLDNN because USE_MKLDNN is not set -- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - found -- Looking for mmap -- Looking for mmap - found -- Looking for shm_open -- Looking for shm_open - found -- Looking for shm_unlink -- Looking for shm_unlink - found -- Looking for malloc_usable_size -- Looking for malloc_usable_size - found -- -- check z16 -- Performing Test COMPILE_OUT_z16 -- Performing Test COMPILE_OUT_z16 - Failed -- Performing Test COMPILE_OUT_z15 -- check z15 -- Performing Test COMPILE_OUT_z15 - Failed -- Performing Test COMPILE_OUT_z14 -- check z14 -- Performing Test COMPILE_OUT_z14 - Failed -- -- Version: 10.2.1 -- Build type: Release -- Using Kineto with CUPTI support -- Configuring Kineto dependency: -- KINETO_SOURCE_DIR = /builddir/build/BUILD/pytorch/third_party/kineto/libkineto -- KINETO_BUILD_TESTS = OFF -- KINETO_LIBRARY_TYPE = static -- CUDA_SOURCE_DIR = /usr/local/cuda-12.3 -- CUDA_INCLUDE_DIRS = /usr/local/cuda-12.3/include -- CUPTI_INCLUDE_DIR = /usr/local/cuda-12.3/include -- CUDA_cupti_LIBRARY = /usr/local/cuda-12.3/lib64/libcupti.so -- Found CUPTI -- Configured Kineto -- GCC 13.2.1: Adding gcc and gcc_s libs to link line -- Performing Test HAS_WERROR_RETURN_TYPE -- Performing Test HAS_WERROR_RETURN_TYPE - Success -- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR -- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR - Success -- Performing Test HAS_WERROR_BRACED_SCALAR_INIT -- Performing Test HAS_WERROR_BRACED_SCALAR_INIT - Failed -- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT -- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT - Success -- Performing Test HAS_WERROR_BOOL_OPERATION -- Performing Test HAS_WERROR_BOOL_OPERATION - Success -- Performing Test HAS_WNARROWING -- Performing Test HAS_WNARROWING - Success -- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS -- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS - Success -- Performing Test HAS_WNO_TYPE_LIMITS -- Performing Test HAS_WNO_TYPE_LIMITS - Success -- Performing Test HAS_WNO_ARRAY_BOUNDS -- Performing Test HAS_WNO_ARRAY_BOUNDS - Success -- Performing Test HAS_WNO_UNKNOWN_PRAGMAS -- Performing Test HAS_WNO_UNKNOWN_PRAGMAS - Success -- Performing Test HAS_WNO_UNUSED_PARAMETER -- Performing Test HAS_WNO_UNUSED_PARAMETER - Success -- Performing Test HAS_WNO_UNUSED_FUNCTION -- Performing Test HAS_WNO_UNUSED_FUNCTION - Success -- Performing Test HAS_WNO_UNUSED_RESULT -- Performing Test HAS_WNO_UNUSED_RESULT - Success -- Performing Test HAS_WNO_STRICT_OVERFLOW -- Performing Test HAS_WNO_STRICT_OVERFLOW - Success -- Performing Test HAS_WNO_STRICT_ALIASING -- Performing Test HAS_WNO_STRICT_ALIASING - Success -- Performing Test HAS_WNO_STRINGOP_OVERFLOW -- Performing Test HAS_WNO_STRINGOP_OVERFLOW - Success -- Performing Test HAS_WVLA_EXTENSION -- Performing Test HAS_WVLA_EXTENSION - Failed -- Performing Test HAS_WSUGGEST_OVERRIDE -- Performing Test HAS_WSUGGEST_OVERRIDE - Success -- Performing Test HAS_WNEWLINE_EOF -- Performing Test HAS_WNEWLINE_EOF - Failed -- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE -- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE - Failed -- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE -- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed -- Performing Test HAS_WNO_ERROR_PEDANTIC -- Performing Test HAS_WNO_ERROR_PEDANTIC - Success -- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST -- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST - Success -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE - Failed -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed -- Performing Test HAS_WCONSTANT_CONVERSION -- Performing Test HAS_WCONSTANT_CONVERSION - Failed -- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION -- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION - Failed -- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE -- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE - Failed -- Performing Test HAS_WNO_MISSING_BRACES -- Performing Test HAS_WNO_MISSING_BRACES - Success -- Performing Test HAS_QUNUSED_ARGUMENTS -- Performing Test HAS_QUNUSED_ARGUMENTS - Failed -- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS -- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS - Success -- Performing Test HAS_FALIGNED_NEW -- Performing Test HAS_FALIGNED_NEW - Success -- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE -- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE - Success -- Performing Test HAS_WNO_MAYBE_UNINITIALIZED -- Performing Test HAS_WNO_MAYBE_UNINITIALIZED - Success -- Performing Test HAS_FSTANDALONE_DEBUG -- Performing Test HAS_FSTANDALONE_DEBUG - Failed -- Performing Test HAS_FNO_MATH_ERRNO -- Performing Test HAS_FNO_MATH_ERRNO - Success -- Performing Test HAS_FNO_TRAPPING_MATH -- Performing Test HAS_FNO_TRAPPING_MATH - Success -- Performing Test HAS_WERROR_FORMAT -- Performing Test HAS_WERROR_FORMAT - Success -- Performing Test HAS_VST1 -- Performing Test HAS_VST1 - Success -- Performing Test HAS_VLD1 -- Performing Test HAS_VLD1 - Success -- Performing Test HAS_WDEPRECATED -- Performing Test HAS_WDEPRECATED - Success -- NUMA paths: -- /usr/include -- /usr/lib64/libnuma.so -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include -- headers outputs: -- sources outputs: -- declarations_yaml outputs: -- Using ATen parallel backend: OMP Found sleef: /usr/lib64/libsleef.so AT_INSTALL_INCLUDE_DIR include/ATen/core core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/TensorBody.h core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/aten_interned_strings.h core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/enum_tag.h disable test because ATEN_NO_TEST is set -- Performing Test HAS_WNO_DEPRECATED_COPY -- Performing Test HAS_WNO_DEPRECATED_COPY - Success -- _GLIBCXX_USE_CXX11_ABI=1 is already defined as a cmake variable -- Using lib/python3.12/site-packages as python relative installation path -- -- ******** Summary ******** -- General: -- CMake version : 3.27.7 -- CMake command : /usr/bin/cmake -- System : Linux -- C++ compiler : /usr/bin/g++ -- C++ compiler id : GNU -- C++ compiler version : 13.2.1 -- Using ccache if found : OFF -- CXX flags : -O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DTMP_LIBKINETO_NANOSECOND -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -- Build type : Release -- Compile definitions : ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;FLASHATTENTION_DISABLE_ALIBI -- CMAKE_PREFIX_PATH : /usr/local/cuda-12.3;/usr/local/cuda-12.3;/usr/local/cuda-12.3 -- CMAKE_INSTALL_PREFIX : /usr -- USE_GOLD_LINKER : OFF -- -- TORCH_VERSION : 2.4.0 -- BUILD_CAFFE2 : OFF -- BUILD_CAFFE2_OPS : OFF -- BUILD_STATIC_RUNTIME_BENCHMARK: OFF -- BUILD_BINARY : OFF -- BUILD_CUSTOM_PROTOBUF : OFF -- Protobuf compiler : /usr/bin/protoc -- Protobuf includes : /usr/include -- Protobuf libraries : /usr/lib64/libprotobuf.so -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 3.12.2 -- Python executable : /usr/bin/python3 -- Pythonlibs version : 3.12.2 -- Python library : /usr/lib64/python3.12 -- Python includes : /usr/include/python3.12 -- Python site-packages: lib/python3.12/site-packages -- BUILD_SHARED_LIBS : ON -- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF -- BUILD_TEST : OFF -- BUILD_JNI : OFF -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- TRACING_BASED : OFF -- USE_BLAS : 1 -- BLAS : open -- BLAS_HAS_SBGEMM : -- USE_LAPACK : 1 -- LAPACK : open -- USE_ASAN : OFF -- USE_TSAN : OFF -- USE_CPP_CODE_COVERAGE : OFF -- USE_CUDA : ON -- Split CUDA : ON -- CUDA static link : OFF -- USE_CUDNN : ON -- USE_EXPERIMENTAL_CUDNN_V8_API: -- USE_CUSPARSELT : OFF -- CUDA version : 12.3 -- USE_FLASH_ATTENTION : ON -- USE_MEM_EFF_ATTENTION : ON -- cuDNN version : 8.9.7 -- CUDA root directory : /usr/local/cuda-12.3 -- CUDA library : /usr/local/cuda-12.3/lib64/stubs/libcuda.so -- cudart library : /usr/local/cuda-12.3/lib64/libcudart.so -- cublas library : /usr/local/cuda-12.3/lib64/libcublas.so -- cufft library : /usr/local/cuda-12.3/lib64/libcufft.so -- curand library : /usr/local/cuda-12.3/lib64/libcurand.so -- cusparse library : /usr/local/cuda-12.3/lib64/libcusparse.so -- cuDNN library : /usr/lib64/libcudnn.so -- nvrtc : /usr/local/cuda-12.3/lib64/libnvrtc.so -- CUDA include path : /usr/local/cuda-12.3/include -- NVCC executable : /usr/local/cuda-12.3/bin/nvcc -- CUDA compiler : /usr/local/cuda-12.3/bin/nvcc -- CUDA flags : --compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all --compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all -DONNX_NAMESPACE=onnx -gencode arch=compute_52,code=sm_52 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_52,code=compute_52 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -DCUDA_HAS_FP16 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -- CUDA host compiler : /usr/bin/cuda-g++ -- CUDA --device-c : OFF -- USE_TENSORRT : OFF -- USE_XPU : OFF -- USE_ROCM : OFF -- BUILD_NVFUSER : -- USE_EIGEN_FOR_BLAS : -- USE_FBGEMM : OFF -- USE_FAKELOWP : OFF -- USE_KINETO : ON -- USE_FFMPEG : OFF -- USE_GFLAGS : ON -- USE_GLOG : ON -- USE_LEVELDB : ON -- LevelDB version : 1.23 -- Snappy version : 1.1.10 -- USE_LITE_PROTO : OFF -- USE_LMDB : ON -- LMDB version : 0.9.32 -- USE_METAL : OFF -- USE_PYTORCH_METAL : OFF -- USE_PYTORCH_METAL_EXPORT : OFF -- USE_MPS : OFF -- USE_MKL : -- USE_MKLDNN : OFF -- USE_UCC : OFF -- USE_ITT : OFF -- USE_NCCL : ON -- USE_SYSTEM_NCCL : ON -- USE_NNPACK : ON -- USE_NUMPY : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENCV : ON -- OpenCV version : 4.9.0 -- USE_OPENMP : ON -- USE_TBB : OFF -- USE_MIMALLOC : OFF -- USE_VULKAN : OFF -- USE_PROF : OFF -- USE_QNNPACK : ON -- USE_PYTORCH_QNNPACK : ON -- USE_XNNPACK : ON -- USE_REDIS : ON -- USE_ROCKSDB : ON -- USE_ZMQ : ON -- USE_DISTRIBUTED : ON -- USE_MPI : OFF -- USE_GLOO : ON -- USE_GLOO_WITH_OPENSSL : OFF -- USE_TENSORPIPE : ON -- Public Dependencies : -- Private Dependencies : Threads::Threads;/usr/lib64/libopenblaso.so;pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;XNNPACK;/usr/lib64/liblmdb.so;/usr/lib64/libleveldb.so;/usr/lib64/libsnappy.so;/usr/lib64/libzmq.so;/usr/lib64/libhiredis.so;opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs;opencv_optflow;opencv_videoio;opencv_video;caffe2::openmp;tensorpipe;gloo;onnx_proto;onnx;onnx_optimizer;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl -- Public CUDA Deps. : caffe2::cuda;caffe2::nvrtc -- Private CUDA Deps. : caffe2::curand;caffe2::cufft;caffe2::cublas;torch::cudnn;__caffe2_nccl;tensorpipe_cuda;gloo_cuda;/usr/local/cuda-12.3/lib64/libcudart.so;CUDA::cusparse;CUDA::cufft;ATEN_CUDA_FILES_GEN_LIB -- USE_COREML_DELEGATE : OFF -- BUILD_LAZY_TS_BACKEND : ON -- USE_ROCM_KERNEL_ASSERT : OFF -- Performing Test HAS_WMISSING_PROTOTYPES -- Performing Test HAS_WMISSING_PROTOTYPES - Success -- Performing Test HAS_WERROR_MISSING_PROTOTYPES -- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Success -- Configuring done (22.7s) CMake Warning at torch/CMakeLists.txt:282 (target_link_libraries): Target "_C" requests linking to directory "/usr/lib64/python3.12". Targets may link only to libraries. CMake is dropping the item. -- Generating done (0.9s) CMake Warning: Manually-specified variables were not used by the project: CMAKE_Fortran_FLAGS_RELEASE CMAKE_INSTALL_DO_STRIP INCLUDE_INSTALL_DIR LIB_INSTALL_DIR LIB_SUFFIX SHARE_INSTALL_PREFIX SYSCONF_INSTALL_DIR USE_BREAKPAD USE_FAST_NVCC -- Build files have been written to: /builddir/build/BUILD/pytorch/build + make -j4 [ 0%] Building C object confu-deps/clog/CMakeFiles/clog.dir/src/clog.c.o [ 0%] Linking C static library ../../lib/libfxdiv.a [ 0%] Linking C static library ../../lib/libfp16.a [ 0%] Linking C static library ../../lib/libpsimd.a [ 0%] Built target fp16 [ 0%] Built target fxdiv [ 0%] Built target psimd [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/datatype-strings.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/allocator.dir/src/allocator.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/normalization.dir/src/normalization.c.o [ 0%] Linking C static library ../../lib/libclog.a [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/microkernel-type.c.o [ 0%] Built target allocator [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/node-type.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernel-utils.dir/src/microkernel-utils.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/operator-type.c.o [ 0%] Built target clog [ 0%] Built target normalization [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/log.c.o [ 0%] Building CXX object confu-deps/XNNPACK/CMakeFiles/convolution-test-helpers.dir/test/convolution-test-helpers.cc.o [ 0%] Built target microkernel-utils [ 0%] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/format.cc.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Allocator.cpp.o [ 0%] Built target logging [ 0%] Running C++/Python protocol buffer compiler on /builddir/build/BUILD/pytorch/caffe2/proto/torch.proto [ 0%] Running C++/Python protocol buffer compiler on /builddir/build/BUILD/pytorch/caffe2/proto/caffe2.proto [ 0%] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/torch.pb.cc.o [ 0%] Built target convolution-test-helpers [ 0%] Building CXX object caffe2/CMakeFiles/caffe2_nvrtc.dir/__/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/AutogradState.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/CPUAllocator.cpp.o [ 0%] Linking CXX shared library ../lib/libcaffe2_nvrtc.so Warning: Unused direct dependencies: libcuda.so.1 /lib64/libm.so.6 /lib64/libgcc_s.so.1 [ 0%] Built target caffe2_nvrtc [ 0%] Generating ATen headers [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/ConstantSymNodeImpl.cpp.o [ 0%] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/caffe2.pb.cc.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/CopyBytes.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DefaultDtype.cpp.o [ 0%] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/os.cc.o [ 0%] Linking CXX static library ../../lib/libfmt.a [ 0%] Built target fmt [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Device.cpp.o [ 0%] Generating ATen headers [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DeviceType.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DispatchKey.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DispatchKeySet.cpp.o [ 0%] Built target Caffe2_PROTO [ 0%] Generating ATen sources [ 0%] Generating ATen sources [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/GeneratorImpl.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/GradMode.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/InferenceMode.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/RefcountedDeleter.cpp.o [ 1%] Generating ATen declarations_yaml [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SafePyObject.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/Scalar.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/ScalarType.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/Storage.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/StorageImpl.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/Stream.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymBool.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymFloat.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymInt.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymIntArrayRef.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymNodeImpl.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymbolicShapeMeta.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/TensorImpl.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/TensorOptions.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/UndefinedTensorImpl.cpp.o [ 1%] Building C object caffe2/CMakeFiles/torch_global_deps.dir/__/torch/csrc/empty.c.o [ 1%] Linking C shared library ../lib/libtorch_global_deps.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 /usr/local/cuda-12.3/lib64/libnvrtc.so.12 libcuda.so.1 /usr/local/cuda-12.3/lib64/libcudart.so.12 /usr/local/cuda-12.3/lib64/libnvToolsExt.so.1 [ 1%] Built target torch_global_deps [ 1%] Built target python_copy_files [ 1%] Generating /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/Functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ViewFuncs.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/RegisterLazy.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/Functions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/variable_factories.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ViewFuncs.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyIr.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNonNativeIr.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_nn_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_fft_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_linalg_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_nested_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_sparse_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_special_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_return_types.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_enum_tag.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_return_types.h, /builddir/build/BUILD/pytorch/torch/testing/_internal/generated/annotated_fn_args.py, /builddir/build/BUILD/pytorch/torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/WrapDimMinimal.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/COW.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/COWDeleter.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/DeviceGuardImplInterface.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/GPUTrace.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/HermeticPyObjectTLS.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/LocalDispatchKeySet.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PyInterpreter.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PyObjectSlot.cpp.o [ 1%] Built target generate-torch-sources [ 1%] Generating /builddir/build/BUILD/pytorch/torch/_C/__init__.pyi, /builddir/build/BUILD/pytorch/torch/_C/_VariableFunctions.pyi, /builddir/build/BUILD/pytorch/torch/nn/functional.pyi [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/SizesAndStrides.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/TorchDispatchModeTLS.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/alloc_cpu.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/thread_pool.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/mobile/CPUCachingAllocator.cpp.o [ 1%] Generating /builddir/build/BUILD/pytorch/torch/utils/data/datapipes/datapipe.pyi [ 1%] Built target torch_python_stubs [ 1%] Generating /builddir/build/BUILD/pytorch/torch/version.py [ 1%] Built target gen_torch_version [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/init.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/add.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/average-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/channel-shuffle.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/clamp.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/conv-prepack.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/convolution.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/mobile/CPUProfilingAllocator.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/deconvolution.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-prepack.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fully-connected.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fully-connected-sparse.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/global-average-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/hardsigmoid.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/hardswish.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/leaky-relu.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/max-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sigmoid.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/softargmax.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/tanh.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/operator-delete.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/conv-run.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/deconv-run.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-run.cc.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/ApproximateClock.cpp.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-unpack.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-dynamic-run.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/indirection.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/operator-run.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8lut32norm/scalar.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8lut/scalar.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sgemm/6x8-psimd.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/mp8x9p8q-neon.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Backtrace.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/up8x9-neon.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/up8xm-neon.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8conv/4x8-neon.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Bfloat16.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8conv/8x8-neon.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x25-neon.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/C++17.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/DeadlockDetection.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x25-neon-per-channel.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x27-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Exception.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/up8x9-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/up8x9-neon-per-channel.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/mp8x7p7q-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/up8x7-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e4m3fn.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/up8xm-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x-sumrows-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e4m3fnuz.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x8-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e5m2.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x8-dq-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x8c2-xzp-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e5m2fnuz.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/6x4-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Half.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8vadd/neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sgemm/5x8-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/LeftRight.cpp.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Logging.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sgemm/6x8-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8clamp/neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8maxpool/16x9p8q-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8maxpool/sub16-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8rmax/neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/MathConstants.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x2-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x3-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x4-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Metaprogramming.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/xm-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Optional.cpp.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8conv/8x8-aarch64-neon.S.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/ParallelGuard.cpp.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-aarch64-neon.S.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/SmallVector.cpp.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x4-packA-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x8c1x4-dq-packedA-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x8c8x1-dq-packedA-aarch64-neon.S.o [ 2%] Linking CXX static library ../../lib/libpytorch_qnnpack.a [ 2%] Built target pytorch_qnnpack [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples1-scalar.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples4-scalar.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x2.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/StringUtil.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u3.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/ThreadLocalDebugInfo.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u2-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u3-acc3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u4-acc2.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeCast.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u4-acc4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u2-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u3-acc3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u4-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u4-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u2-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u3-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u4-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u4-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-4x-scalar-c1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeList.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9p8x-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9x-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9p8x-minmax-scalar-c1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeTraits.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9x-minmax-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/f32-conv-hwc-3x3s2p0p1c3x4-scalar-1x1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/Type_demangle.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/f32-conv-hwc-3x3s2p1c3x4-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc4.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/Type_no_demangle.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/Unicode.cpp.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/UniqueVoidPtr.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-2x1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/complex_math.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-3x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-4x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-5x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-6x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc2.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/util/flags_use_gflags.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc3.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc4.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-2x1-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-2x1.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/util/flags_use_no_gflags.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-3x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-4x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc3.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc4.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/util/int128.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc5.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1-acc3.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-3x1-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-3x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc2.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/util/intrusive_ptr.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc3.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc4.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc5.c.o [ 4%] Built target ATEN_CPU_FILES_GEN_TARGET [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1.c.o [ 4%] Built target ATEN_CUDA_FILES_GEN_TARGET [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/hardware-config.dir/src/configs/hardware-config.c.o [ 4%] Built target hardware-config [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/indirection.dir/src/indirection.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1-acc3.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/util/numa.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-3x1-acc2.c.o [ 4%] Built target indirection [ 4%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/aarch32-assembler.cc.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-3x1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-scalar.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/util/signal_handler.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-minmax-scalar-acc2.c.o [ 4%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/aarch64-assembler.cc.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3f3m3l1c1s1r-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3f3m3l1c1s1r-scalar.c.o [ 5%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/assembler.cc.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-minmax-scalar.c.o [ 5%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-1x16-aarch64-neonfp16arith-ld64.cc.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-minmax-scalar.c.o [ 5%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-4x16-aarch64-neonfp16arith-ld64.cc.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-minmax-scalar.c.o [ 5%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-cortex-a55.cc.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-minmax-scalar.c.o [ 5%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-cortex-a55r0.cc.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-scalar.c.o [ 5%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-cortex-a75.cc.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-minmax-scalar-acc2.c.o [ 5%] Building CXX object c10/CMakeFiles/c10.dir/util/tempfile.cpp.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neon.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u3.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u4.c.o [ 5%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-ld64.cc.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u3.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool-cw/f32-gavgpool-cw-scalar-u1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7p7x-minmax-scalar-c1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7x-minmax-scalar-c1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-relu-scalar.c.o [ 6%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-1x16-aarch64-neonfp16arith-ld64.cc.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-relu-scalar.c.o [ 6%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-4x16-aarch64-neonfp16arith-ld64.cc.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x4-minmax-scalar.c.o [ 6%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-cortex-a55.cc.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-scalar.c.o [ 6%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-cortex-a55r0.cc.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-maxpool/f32-maxpool-9p8x-minmax-scalar-c1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9p8x-minmax-scalar-c1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9x-minmax-scalar-c1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-3x3-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-scalar-2x1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-scalar-2x4.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/thread_name.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-scalar.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/typeid.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x4-minmax-scalar.c.o [ 6%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-cortex-a75.cc.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-scalar.c.o [ 6%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-ld64.cc.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-1x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-2x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-4x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x2-minmax-scalar.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x4-minmax-scalar.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u1.c.o [ 7%] Linking CXX shared library ../lib/libc10.so [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u4.c.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-1x8-aarch64-neonfma-cortex-a53.cc.o [ 7%] Built target c10 [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microparams-init.dir/src/microparams-init.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfp16.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfma.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u2.c.o [ 7%] Built target microparams-init [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/packing.dir/src/packing.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u3.c.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-1x8-aarch64-neonfma-cortex-a75.cc.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u2-acc2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4-acc2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4-acc4.c.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-1x8-aarch64-neonfma-ld64.cc.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u2-acc2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4-acc2.c.o [ 8%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-cortex-a53.cc.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonv8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u4-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u4-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u4-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u4-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u4-acc4.c.o [ 8%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-cortex-a55.cc.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-1x1-minmax-scalar-pipelined.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-1x1-minmax-scalar.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-2x1-minmax-scalar-pipelined.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neon-aarch64.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-2x1-minmax-scalar.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-scalar-pipelined.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-scalar.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-scalar-pipelined.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-scalar.c.o [ 9%] Built target packing [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/memory.dir/src/memory.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x2-minmax-scalar.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfma-aarch64.c.o [ 9%] Built target memory [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/mutex.dir/src/mutex.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x4-minmax-scalar.c.o [ 9%] Built target mutex [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/post-operation.dir/src/operators/post-operation.c.o [ 9%] Built target post-operation [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/operator-utils.dir/src/operator-utils.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u4.c.o [ 9%] Built target operator-utils [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/operator-run.dir/src/operator-run.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u1.c.o [ 9%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-cortex-a75.cc.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u4.c.o [ 9%] Built target operator-run [ 9%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAAllocatorConfig.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfp16arith.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u2.c.o [ 9%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-ld128.cc.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u8.c.o [ 9%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-cortex-a53.cc.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u1.c.o [ 9%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u4.c.o [ 10%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-cortex-a55.cc.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfp16arith-aarch64.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondot.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u4.c.o [ 10%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-cortex-a75.cc.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondot-aarch64.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondotfp16arith.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondotfp16-aarch64.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u8.c.o [ 10%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u1.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u2.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u4.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u8.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u1.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u2.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u4.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u8.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u1.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u2.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u4.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-ld128.cc.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u8.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u1.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u2.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u4.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma.S.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u8.c.o [ 11%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u1.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u2.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u4.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u8.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u1.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u2.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u4.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u8.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u1.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u2.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u4.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u8.c.o [ 12%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-1x8-aarch64-neonfma-cortex-a53.cc.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u1.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u2.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u4.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u8.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u1.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u2.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u4.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u1.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-goi-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u2.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 12%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-1x8-aarch64-neonfma-cortex-a75.cc.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u4.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u8.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u1.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u2.c.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 12%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u3.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u4.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u5.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 13%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-cortex-a53.cc.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u6.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u1.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u2.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u3.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u4.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u5.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u6.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u1.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u2.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u4.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u1.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u2.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u4.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c1-minmax-scalar-2x.c.o [ 13%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-cortex-a55.cc.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c2-minmax-scalar-2x.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-scalar-2x.c.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 13%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u1.c.o [ 14%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 14%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u2.c.o [ 14%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 14%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u4.c.o [ 14%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u8.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u1.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u2.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u4.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u1.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u2.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u4.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u1.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u2.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u4.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 15%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-cortex-a75.cc.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondotfp16arith-cortex-a55.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u1.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u2.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u4.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u1.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u2.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u4.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u1.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u2.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mull.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u4.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u1.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u2.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u4.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u1.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u2.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u4.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u1.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u2.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u4.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u1.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u2.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 16%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-ld128.cc.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u1.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u2.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u4.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2-k-over-64.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2-k-over-2048.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-8.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-16.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-32.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-64.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-2048.c.o [ 16%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-cortex-a53.cc.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/vlog.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u2.c.o [ 16%] Built target microkernels-prod [ 16%] Linking CXX static library ../lib/libcaffe2_protos.a [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u4.c.o [ 16%] Built target caffe2_protos [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/cache.dir/src/cache.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u4.c.o [ 16%] Built target cache [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operator-delete.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/argmax-pooling-nhwc.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u3.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/average-pooling-nhwc.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut4-p4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut8-p3.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut8-p4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut16-p3.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut16-p4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-p5.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/batch-matrix-multiply-nc.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-p6.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-lut64-p2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-lut2048-p1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/binary-elementwise-nd.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-p5.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-scalar-bitcast.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-scalar-fabsf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-addsub.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-cvt.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-floor.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-addsub.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-nearbyint.c.o [ 17%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-cortex-a55.cc.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-rint.c.o [ 17%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDADeviceAssertionHost.cpp.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-addsub.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/channel-shuffle-nc.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-ceil.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-cvt.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/constant-pad-nd.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-addsub.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-cvt.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-trunc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/convolution-nchw.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-lut64-p2-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-lut2048-p1-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-p5-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h2ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h2ts-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/convolution-nhwc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h3ps-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h3ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h2ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h2ts-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ps-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ts-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/deconvolution-nhwc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h2ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h2ts-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h3ps-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h3ts-div.c.o [ 18%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAException.cpp.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut32-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut64-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/dynamic-fully-connected-nc.c.o [ 18%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-cortex-a75.cc.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h4ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ps-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ps-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/fully-connected-nc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ts-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h2ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h3ps-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h3ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/global-average-pooling-ncw.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h2ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h2ts-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/global-average-pooling-nwc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ps-div.c.o [ 18%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAFunctions.cpp.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ps-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ts-rcp.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/lut-elementwise-nc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h2ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/max-pooling-nhwc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h3ps-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h3ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut32-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut64-p3h1ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/prelu-nc.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h4ts-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h5ps-div.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/reduce-nd.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h5ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h2ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/resize-bilinear-nchw.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h3ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h3ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/resize-bilinear-nhwc.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h2ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h3ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/rope-nthc.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h3ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/scaled-dot-product-attention-nhtc.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h2ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h3ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h3ts-div.c.o [ 19%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAMallocAsyncAllocator.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut32-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/slice-nd.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut64-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h4ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h5ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/softmax-nc.c.o [ 19%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-ld128.cc.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h5ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h2ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/transpose-nd.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h3ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h3ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h2ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h3ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/unary-elementwise-nc.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h3ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h2ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h3ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h3ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut32-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut64-p3h1ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h4ts-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h5ps-div.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h5ts-div.c.o [ 19%] Built target jit [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-bitmanip.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-clz-binsearch.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/unpooling-nhwc.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-clz-newton.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti32-sqrt-lrint.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti64-sqrt-lrint.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti64-sqrtf-lrintf.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvtu32-sqrt-lrint.c.o [ 19%] Built target operators [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvtu32-sqrtf-lrintf.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/memory-planner.c.o [ 19%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAMiscFunctions.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-hashemian.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-tflm.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/runtime.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu32-sqrt-cvtsatu32f64.c.o [ 19%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAStream.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu32-sqrt-llrint.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu64-sqrt-llrint.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x1-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x2-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x2-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/abs.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/add2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/argmax-pooling-2d.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/average-pooling-2d.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/bankers-rounding.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/batch-matrix-multiply.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x2-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/ceiling.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/clamp.c.o [ 19%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/impl/CUDAGuardImpl.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/concatenate.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x2-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/convert.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/convolution-2d.c.o [ 19%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/impl/CUDATest.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4-minmax-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/copy.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x2-minmax-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/deconvolution-2d.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4-minmax-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8-minmax-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/depth-to-space-2d.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/depthwise-convolution-2d.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x2-minmax-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4-minmax-scalar.c.o [ 20%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/driver_api.cpp.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/divide.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8-minmax-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/elu.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/even-split.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4-minmax-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/floor.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/fully-connected-sparse.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/fully-connected.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/global-average-pooling.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/global-sum-pooling.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/hardswish.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/leaky-relu.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/max-pooling-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Linking CXX shared library ../../lib/libc10_cuda.so [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/maximum2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/minimum2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/multiply2.c.o Warning: Unused direct dependencies: libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libm.so.6 [ 21%] Built target c10_cuda [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/negate.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/prelu.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/reshape-helpers.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/scaled-dot-product-attention.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/sigmoid.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/softmax.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/space-to-depth-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/square-root.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/square.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/squared-difference.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-constant-pad.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-mean.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-reshape.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-resize-bilinear-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-slice.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-transpose.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/subtract.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/tanh.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/unpooling-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/validation.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/tensor.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Built target subgraph [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/argmaxpool-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/avgpool-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/binary-elementwise-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/cmul-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/conv-hwc2chw-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/dwconv-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/dwconv2d-chw-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/experiments-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gavgpool-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gavgpool-cw-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gemm-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-rndnu-scalar.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/ibilinear-chw-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/ibilinear-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/lut32norm-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-rndnu-scalar.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/maxpool-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/pavgpool-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/prelu-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-rndnu-scalar.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/raddstoreexpminusmax-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/reduce-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/rmax-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/spmm-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/transpose-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/unary-elementwise-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u1.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/unpool-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u3.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/vmulcaddc-config.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/xx-fill-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/xx-pad-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/x8-lut-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/zip-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/init.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/params.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c4.c.o [ 22%] Linking CXX static library ../../lib/libXNNPACK.a [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p1c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p2c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p2c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-4p2c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Built target XNNPACK [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/AccumulateType.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CPUGeneratorImpl.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CachedTensorUtils.cpp.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ConjugateFallback.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-scalar.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-signed64.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-unsigned32.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DLConvertor.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-unsigned64.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-scalar.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9p8x-minmax-fp32-scalar-imagic-c1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9x-minmax-fp32-scalar-imagic-c1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Dispatch.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DynamicLibrary.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ExpandUtils.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FuncTorchTLS.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-rndnu-scalar.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalInverses.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalStorageImpl.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u3.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalTensorWrapper.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalizeFallbackKernel.cpp.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchedFallback.cpp.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchedTensorImpl.cpp.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-signed64.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-unsigned32.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-unsigned64.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c2.c.o [ 27%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchingRegistrations.cpp.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-9p8x-minmax-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-vclamp/s8-vclamp-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-lut32norm/u8-lut32norm-scalar.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-maxpool/u8-maxpool-9p8x-minmax-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-rmax/u8-rmax-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-vclamp/u8-vclamp-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-accumulate/gen/u32-filterbank-accumulate-scalar-x1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-subtract/u32-filterbank-subtract-scalar-x2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u64-u32-vsqrtshift/u64-u32-vsqrtshift-scalar-cvtu32-sqrt-cvtu32f64-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x2-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x2-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x4-gemm-goi-scalar-int-u2.c.o [ 27%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapMode.cpp.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x4-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x8-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x8-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x16-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x16-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x32-gemm-goi-scalar-int-u2.c.o [ 27%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapTransforms.cpp.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x32-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-1x2-scalar-int.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-1x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x3-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-xm-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-1x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-1x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-1x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-1x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x1-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x1-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c1s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c1s1r-gemm-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c2s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c2s1r-gemm-scalar-int.c.o [ 28%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/MapAllocator.cpp.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c1s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c1s1r-gemm-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c4s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c4s1r-gemm-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x3-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x3-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x4-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x4-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-2x-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-3x-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-4x-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x2-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x4-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x1-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x4-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x4-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x1-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x1-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-unpool/x32-unpool-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c1s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c1s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c2s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c2s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c1s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c1s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c4s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c4s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x2-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x3-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x4-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-xm-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-1x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-1x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x1-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x1-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x1-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x1-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-copy/xx-copy-scalar-memcpy.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-fill/xx-fill-scalar-u16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-pad/xx-pad-p4-scalar-u16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-transposev/xx-transposev-1x1-scalar-memcpy.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u1.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u2.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u1.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u2.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u4.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h2ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h2ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ps-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ps-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p3h1ts-div.c.o [ 29%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/MemoryOverlap.cpp.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h2ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h2ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ps-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p3h1ts-div.c.o [ 29%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/NamedTensorUtils.cpp.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h2ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h2ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h3ps-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h3ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut32-p3h1ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut64-p3h1ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ps-rcp.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ts-rcp.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h2ts-rcp.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut32-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut64-p3h1ts-div.c.o [ 30%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/NestedTensorImpl.cpp.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut32-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut64-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut32-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut64-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-neon-x1.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-neon-x4.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples1-neon.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples4-neon.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/cs16-fftr-neon-x4.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x4.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x12.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u24.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u32.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u24.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u32.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-4x-neon-c4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9p8x-neon-c4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9x-neon-c4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9p8x-minmax-neon-c4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9x-minmax-neon-c4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-neon-2x2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-neon-2x1.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-neon-2x2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-neon-2x1.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-neon-2x2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-neon-2x1.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-neon-2x2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-neon-2x1.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-neon-2x2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4-acc2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4-acc3.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4-acc4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-2x4-acc2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-2x4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-3x4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-4x4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-5x4.c.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelCommon.cpp.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-6x4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4-acc2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4-acc3.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4-acc4.c.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelNative.cpp.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelNativeTBB.cpp.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelOpenMP.cpp.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-2x4-acc2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-2x4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-3x4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelThreadPoolNative.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-4x4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/PythonTorchFunctionTLS.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc5.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-2x4-acc2.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SavedTensorHooks.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-2x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-2x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-3x4-acc2.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ScalarOps.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-3x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-4x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-4x4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SequenceNumber.cpp.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SparseCsrTensorImpl.cpp.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SparseTensorImpl.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-5x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc5.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-2x4-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-2x4-acc3.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-2x4.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/StorageUtils.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-3x4-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-3x4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neon-acc2.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorGeometry.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorIndexing.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorIterator.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorMeta.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorNames.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorUtils.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ThreadLocalPythonObjects.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u24.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ThreadLocalState.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u32.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool-cw/f32-gavgpool-cw-neon-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7p7x-minmax-neon-c4.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Utils.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7x-minmax-neon-c4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neon-dup-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neon-lane-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8s4-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x16-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x16-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-neon-lane-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-dup-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-dup-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-lane-ld64.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Version.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x2-minmax-neon-lane-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/VmapModeRegistrations.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-dup-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-lane-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ZeroTensorFallback.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16-minmax-neon-lane-ld128.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/autocast_mode.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-8x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-dup-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-dup-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-8x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neon-p4.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neon-p8.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/cpu/FlushDenormal.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neon-p16.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/cpu/Utils.cpp.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/CPUGuardImpl.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neon-c4.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neon-c8.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neon-dup-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/CUDAHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8s4-minmax-neon.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/HIPHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-dup-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-neon-lane-ld128.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/IPUHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x2-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-dup-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-8x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-maxpool/f32-maxpool-9p8x-minmax-neon-c4.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9p8x-minmax-neon-c4.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MPSHooksInterface.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9x-minmax-neon-c4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-neon-prfm.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-neon.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-neon-prfm.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-neon.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-neon-prfm.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-neon.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-1x4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-1x8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-1x16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-2x4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-2x8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-2x16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-4x4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-4x8.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MTIAHooksInterface.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-4x16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x8-minmax-neon-lane-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MetaGuardImpl.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-neon-lane-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/ORTHooksInterface.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x2-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-neon-lane-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/PrivateUse1HooksInterface.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u24.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u32.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/XPUHooksInterface.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u24.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u32.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/ADInterpreters.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u8-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u8.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesActivation.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u12-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u12-acc3.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u12.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesBinaryOps.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u16-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u16-acc4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u20-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u20-acc5.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u20.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u8-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u12-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u12-acc3.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u12.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u16-acc2.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesConvolution.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u20-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u20-acc5.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u20.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u8-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u8-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u8-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u8-acc2.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesDecompositions.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neon-x2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neon-x2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neon-x2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neon-x2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-neon-u4.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesDynamic.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-neon-u4.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesFactory.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-neon-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-neon-u8.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesHelper.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-neon-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-neon-2x.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c8-minmax-neon-2x.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-neon-u4.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesLinearAlgebra.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neon-u8.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesLoss.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u12.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesModules.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-neon-u4.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-neon-u4.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-neon-u4.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u24.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-neon-int16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-neon-int32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neon-rr2-lut16-p3.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neon-rr2-p6.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-neon.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qs8-cvt-neon.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qu8-cvt-neon.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesNorm.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-neon-addsub.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-neon-cvt.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-neon-addsub.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesPooling.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-neon-addsub.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-neon-cvt.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-neon-addsub.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-neon-cvt.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neon-rr2-lut64-p2-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neon-rr2-lut2048-p1-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neon-rr2-p5-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neon-nr1rsqrts.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neon-nr2rsqrts.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neon-nr3rsqrts.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neon-expm1minus-rr1-p6h5ts-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neon-expm1minus-rr2-lut8-p4h2ts-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neon-expm1minus-rr2-lut8-p4h3ps-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c2s4-minmax-neon-mlal.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesRandomness.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c2s4-minmax-neon-mlal.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesReduceOps.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesScatterOps.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c2s4-minmax-neon-mlal.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c2s4-minmax-neon-mlal.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16-minmax-neon-mlal-lane.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesUnaryOps.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesViews.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchedFallback.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mla8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mla8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mla8-ld128.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchedTensorImpl.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/DynamicLayer.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul8-ld128.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/FunctionalizeInterpreter.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c8s8r-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/Interpreter.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mul8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-neon-mul16.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/LegacyVmapTransforms.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mul8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u24.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/PlumbingHelper.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p8c-minmax-fp32-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-4p8c-minmax-fp32-neon-mla8-ld64.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/PyTorchOperatorHacks.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul8-ld128.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/TensorWrapper.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/VmapInterpreter.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mul16.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/VmapModeRegistrations.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/record_function.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ATenGeneral.cpp.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/BackendSelectFallbackKernel.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-dup.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2s4-minmax-fp32-neon-mlal.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neon-mlal-dup.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4s2-minmax-fp32-neon-mlal.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-neon-mlal.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/DeprecatedTypeProperties.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/DeprecatedTypePropertiesRegistry.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Dict.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neon-mlal-lane.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-dup.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Dimname.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Formatting.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2s4-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Generator.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4s2-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/GeneratorForPrivateuseone.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/List.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/MetaFallbackKernel.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NamedRegistrations.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NamedTensor.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NestedIntSymNodeImpl.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/PythonFallbackKernel.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/PythonOpRegistrationTrampoline.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2s4-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4s2-minmax-fp32-neon-mlal.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Range.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-neon-mlal.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Tensor.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/TorchDispatchUtils.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/VariableFallbackKernel.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2s4-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4s2-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-neon-mlal.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/VariableHooksInterface.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Vitals.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/adaption.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/blob.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/boxing/KernelFunction.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/class_type.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/custom_class.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-neon.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-neon.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-neon.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-neon-mull.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/DispatchKeyExtractor.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-neon-qdmulh.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u24.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld128-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u24.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld128-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/Dispatcher.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld128-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-neon-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-neon-u32.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/ObservedOperators.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-neon-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-neon-u32.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/OperatorEntry.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-neon-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-neon-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neon-ld128-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dynamic_type.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-rndnu-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-rndnu-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-rndnu-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-rndnu-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-rndnu-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-rndnu-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-neon-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-neon-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9p8x-minmax-fp32-neon-c8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9x-minmax-fp32-neon-c8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/function_schema.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/interned_strings.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ivalue.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/library.cpp.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/op_registration/infer_schema.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-rndnu-neon-mul8.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/op_registration/op_registration.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-rndnu-neon-mul16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/operator_name.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-rndnu-neon-mul8.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/register_symbols.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-neon-mul16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/tensor_type.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-rndnu-neon-mul8.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/type.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/type_factory.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c24.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/union_type.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c24.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/error_report.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/function_schema_parser.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/lexer.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/schema_type_parser.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x16-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/strtod.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/source_range.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Activation.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-neon.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-neon.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-neon.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld64-u32.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveAveragePooling.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld64-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-neon-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-neon-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-neon-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-neon-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-neon-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-neon-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-neon-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-neon-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-neon-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neon-ld64-u16.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveAveragePooling3d.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neon-ld128-u16.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveMaxPooling2d.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-rndnu-neon-ld64-u8.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveMaxPooling3d.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-rndnu-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-rndnu-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-rndnu-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-rndnu-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-rndnu-neon-ld128-u16.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AffineGridGenerator.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-neon-c8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-2p2x-minmax-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-4p3x-minmax-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-9p8x-minmax-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-vclamp/s8-vclamp-neon-u64.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AmpKernels.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x24.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x32.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AutogradComposite.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u24.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u32.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u24.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u32.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AveragePool2d.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u16.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AveragePool3d.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u24.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u32.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-neon-c8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-neon-c16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-maxpool/u8-maxpool-9p8x-minmax-neon-c16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-rmax/u8-rmax-neon-u16.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BatchLinearAlgebra.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-vclamp/u8-vclamp-neon-u64.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-accumulate/gen/u32-filterbank-accumulate-neon-x1.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-accumulate/gen/u32-filterbank-accumulate-neon-x2.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-multi-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-multi-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-multi-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-mov-zip-neon.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-switch-zip-neon.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BinaryOps.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x2-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x3-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x4-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-xm-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u12-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u12.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u16-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Blas.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u12-prfm.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BlasKernel.cpp.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Bucketization.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u12.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u16-prfm.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/CPUBlas.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-dec-zip-neon.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/CPUFallback.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-switch-zip-neon.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ChanelShuffle.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-dec-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-mov-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-multi-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-switch-zip-neon.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Col2Im.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/x24-transposec-2x2-neon-tbl64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-neon-ld2lane-u2-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-neon-ld2lane-u2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u4.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u4.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ComparisonUtils.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u8.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Constraints.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u4.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Convolution.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionMM2d.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u4.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionMM3d.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u4-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u4.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u8-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u4-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u4.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u8-prfm.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-dec-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-mov-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-multi-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-switch-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-dec-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-mov-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-multi-zip-neon.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionTBC.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-switch-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-dec-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-mov-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-multi-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-switch-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-dec-zip-neon.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Copy.cpp.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Correlation.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-mov-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-multi-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-switch-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-unpool/x32-unpool-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x2-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x3-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x4-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-xm-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-dec-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-mov-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-multi-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-switch-zip-neon.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Cross.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-dec-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-mov-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-multi-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-switch-zip-neon.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-fill/xx-fill-neon-u64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-pad/xx-pad-p16-neon-u16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neonfp16-u8.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool2d.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neonfp16-u16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u4.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u16-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u24-acc3.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u32-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u32-acc4.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool3d.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neonfp16-u8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neonfp16-u16.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DispatchStub.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-neonfp16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-neonfp16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonfma-shland.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonfma-zip.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-2x4c8-minmax-neonfma-shland.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-2x4c8-minmax-neonfma-zip.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-3x4c8-minmax-neonfma-shland.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distributions.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-3x4c8-minmax-neonfma-zip.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-4x4c8-minmax-neonfma-shland.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-4x4c8-minmax-neonfma-zip.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-5x4c8-minmax-neonfma-shland.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-5x4c8-minmax-neonfma-zip.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neonfma-acc2.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Dropout.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neonfma-acc2.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Embedding.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neonfma.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/EmbeddingBag.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neonfma.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Fill.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neonfma.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ForeachOpsKernels.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neonfma-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neonfma.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FractionalMaxPool2d.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neonfma-dup-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neonfma-dup-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-8x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neonfma-dup-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8s4-minmax-neonfma.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FractionalMaxPool3d.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FunctionOfAMatrixUtils.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neonfma-dup-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-8x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neonfma-p4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neonfma-p8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neonfma-p16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neonfma-c4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neonfma-c8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neonfma-dup-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neonfma-dup-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-8x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-neonfma-dup-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8s4-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u4.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GatedLinearUnit.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u8-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u12-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u12-acc3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u12.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u16-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u16-acc4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u20-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u20-acc5.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u8-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u12-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u12-acc3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u12.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u16-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u16-acc4.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u20-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u20-acc5.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u20.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neonfma-pipelined.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neonfma-x2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neonfma.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neonfma-pipelined.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neonfma-x2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neonfma.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x1-minmax-neonfma.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neonfma-pipelined.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neonfma-x2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neonfma.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neonfma-pipelined.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neonfma-x2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neonfma.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Histogram.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u4.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u12.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u20.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u4.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u12.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u20.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u24.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Im2Col.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-neonfma-2x.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c8-minmax-neonfma-2x.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u4.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/IndexingUtils.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u12.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u20.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u4.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u12.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u20.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Integration.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u4.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u12.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Itertools.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u16.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LegacyBatching.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u20.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u4.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u12.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u20.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u4.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u16.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LegacyBridge.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u20.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Lerp.cpp.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Linear.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u24.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LinearAlgebra.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u20.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u24.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Loss.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u20.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u24.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u20.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u24.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u20.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossCTC.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u24.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr1rsqrts1fma1adj-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr1rsqrts1fma1adj-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr1rsqrts1fma1adj-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr2fma1adj-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr2fma1adj-u8.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossMultiLabelMargin.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr2fma1adj-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u12.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossMultiMargin.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u4.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u16.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL2d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-neonfma-rr2-lut64-p2.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-neonfma-rr2-p5.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neonfma-rr1-lut16-p3.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neonfma-rr1-p6.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-neonfma-rr2-lut64-p2.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-neonfma-rr2-lut2048-p1.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-neonfma-rr2-p5.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut64-p2-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut64-p2-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut64-p2-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut2048-p1-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut2048-p1-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-p5-nr1recps1fma.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-p5-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-p5-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut64-p2-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut64-p2-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut64-p2-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut2048-p1-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut2048-p1-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut2048-p1-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-p5-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-p5-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-p5-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr1rsqrts1fma1adj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr2fma1adj.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxUnpooling.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr2fma.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Memory.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr3fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h2ts-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h2ts-nr2fma.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MetaTensor.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h2ts-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr1recps1fmaadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2fmaadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2recpsadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NNPACK.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fmaadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2fmaadj.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2recpsadj.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u8.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u24.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u32.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u24.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u32.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neonv8-u4.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neonv8-u8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neonv8-u4.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neonv8-u8.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveDilatedConvolution.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neonv8-u4.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neonv8-u8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neonv8-u4.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neonv8-u8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qs8-cvt-neonv8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qu8-cvt-neonv8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-neonv8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-neonv8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-neonv8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-neonv8.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NamedTensor.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NegateFallback.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Normalization.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Onehot.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c8.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c24.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c32.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PackedSequence.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c8.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c24.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c32.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p8c-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neonv8-mla8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PadNd.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mla8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PixelShuffle.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mla8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PointwiseOps.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Pooling.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mla8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul8-ld128.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Pow.cpp.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/QuantizedLinear.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RNN.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mla8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mul8-ld64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mul8-ld128.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RangeFactories.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neonv8-mla8-ld64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neonv8-mul8-ld64.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReduceAllOps.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mla8-ld64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mla8-ld128.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mul8-ld64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mul8-ld128.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReduceOps.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReflectionPad.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-neonv8-mlal.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Repeat.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReplicationPadding.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Resize.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RowwisePrune.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Scalar.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SegmentReduce.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-neonv8-mlal.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SobolEngineOpsUtils.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SoftMax.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-neonv8-mlal.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Sorting.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SparseTensorUtils.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SpectralOps.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neonv8-ld64-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neonv8-ld64-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neonv8-ld128-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neonv8-ld64-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neonv8-ld64-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neonv8-ld128-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SummaryOps.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-neonv8-mul16.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorAdvancedIndexing.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-fp32-neonv8-mul16.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorCompare.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-fp32-neonv8-mul16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c24.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c32.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c16.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorConversions.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c24.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neonv8-ld64-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neonv8-ld64-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neonv8-ld128-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neonv8-ld64-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neonv8-ld64-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neonv8-ld128-u16.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorFactories.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-aarch64-neon-u4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-aarch64-neon-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-aarch64-neon-u4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-aarch64-neon-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-aarch64-neon-u4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-aarch64-neon-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-aarch64-neon-sqrt-u4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-aarch64-neon-sqrt-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-aarch64-neon-sqrt-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u48.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/x24-transposec-4x4-aarch64-neon-tbl128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/x32-transposec-4x4-aarch64-neon-tbl128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-aarch64-neonfma-2x2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-aarch64-neonfma-2x1.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-aarch64-neonfma-2x2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-aarch64-neonfma-2x1.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-aarch64-neonfma-2x2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-aarch64-neonfma-2x1.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-aarch64-neonfma-2x2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-aarch64-neonfma-2x1.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-aarch64-neonfma-2x2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4-acc3.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4-acc4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-2x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-2x4.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorIteratorReduce.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-3x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-4x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-5x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-6x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4-acc3.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4-acc4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-2x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-2x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-3x4.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorProperties.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-4x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc3.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc4.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorShape.cpp.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorTransformations.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc5.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-2x4-acc2.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-2x4-acc3.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-2x4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-3x4-acc2.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-3x4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-4x4-acc2.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TestOps.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-4x4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-5x4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc2.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc3.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc4.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TriangularOps.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc5.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-2x4-acc2.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-2x4-acc3.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-2x4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-3x4-acc2.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-3x4.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TypeProperties.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UnaryOps.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unfold2d.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unfold3d.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UnfoldBackward.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unique.cpp.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSample.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleBicubic2d.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-aarch64-neonfma-prfm.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-aarch64-neonfma-prfm.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-aarch64-neonfma-prfm.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-aarch64-neonfma.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleBilinear2d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x2-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x4-minmax-aarch64-neonfma.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleLinear1d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x2-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x4-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x2-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x4-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x2-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x4-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x2-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x4-minmax-aarch64-neonfma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u4.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u8.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u12.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u16.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u20.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u24.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u4.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u8.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u12.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u16.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u20.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u24.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u4.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u8.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u12.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u16.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u20.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest1d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u24.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u8.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u12.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest2d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u16.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u4.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u8.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u12.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u16.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr1-lut64-p2-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr1-lut2048-p1-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr1-p5-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr2-lut64-p2-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr2-lut2048-p1-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr2-p5-div.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-fp16arith-u4.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest3d.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-fp16arith-u1.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleTrilinear3d.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-fp16arith-sqrt-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-fp16arith-sqrt-u2.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/VariableMethodStubs.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-fp16arith-sqrt-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x16c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x16c4-minmax-neondotfp16arith.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/WeightNorm.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-5x8c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/group_norm.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-5x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-6x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-6x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x8c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/layer_norm.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-avgpool/f16-avgpool-9p8x-minmax-neonfp16arith-c8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-avgpool/f16-avgpool-9x-minmax-neonfp16arith-c8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8-acc2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8-acc3.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8-acc4.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-2x8-acc2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-2x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-3x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-4x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-5x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-6x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8-acc2.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/prim_native_functions.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8-acc3.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8-acc4.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-2x8-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-2x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-3x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-4x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc3.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc4.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc5.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-2x8-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-2x8-acc3.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-2x8.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/verbose_wrapper.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-3x8-acc2.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/library.cpp.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/fbgemm_utils.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-3x8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-4x8-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-4x8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-5x8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc3.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc4.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc5.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-2x8-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-2x8-acc3.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-2x8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-3x8-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-3x8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p8c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p8c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p16c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p16c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p32c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p32c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p8c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p8c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p16c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p16c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p32c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p32c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l8c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l8c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l16c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l16c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l32c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l32c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l8c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l8c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l16c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l16c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l32c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l32c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l8c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l8c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l16c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l16c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l32c8s4r-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l32c8s4r-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p8c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p8c-minmax-neonfp16arith.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_deserialize.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p16c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p16c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p32c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p32c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p8c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p8c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p16c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p16c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p32c-minmax-neonfp16arith-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p32c-minmax-neonfp16arith.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool-cw/f16-gavgpool-cw-neonfp16arith-u8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c24.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_dynamic.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c24.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-8x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-8x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-8x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-8x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear-chw/gen/f16-ibilinear-chw-neonfp16arith-p4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear-chw/gen/f16-ibilinear-chw-neonfp16arith-p8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear-chw/gen/f16-ibilinear-chw-neonfp16arith-p16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear/gen/f16-ibilinear-neonfp16arith-c8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear/gen/f16-ibilinear-neonfp16arith-c16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-1x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-1x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-4x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-4x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-6x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-6x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-8x8-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-8x16-minmax-neonfp16arith-ld64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-maxpool/f16-maxpool-9p8x-minmax-neonfp16arith-c8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-pavgpool/f16-pavgpool-9p8x-minmax-neonfp16arith-c8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-pavgpool/f16-pavgpool-9x-minmax-neonfp16arith-c8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-prelu/gen/f16-prelu-neonfp16arith-2x8.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-prelu/gen/f16-prelu-neonfp16arith-2x16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u24.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_serialize.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u32-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u40-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u40-acc5.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u40.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u48-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u48-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u48.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u64-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u72-acc3.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u72.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u80-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u80-acc5.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u80.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96-acc3.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96-acc6.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u16-acc1.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u16-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u24-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u24-acc3.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u24.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u32-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u32-acc4.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u64-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u64-acc4.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u16-acc1.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u16-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u24-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u24-acc3.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u24.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u32-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u32-acc4.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u64-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u64-acc4.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u64.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u16-acc1.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u16-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u24-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u24-acc3.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u24.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u32-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u32-acc4.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u64-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u64-acc4.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/FlattenIndicesKernel.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u16-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u24-acc3.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u32-acc2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u32-acc4.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-8x1-minmax-neonfp16arith-pipelined.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-8x1-minmax-neonfp16arith-x2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-8x1-minmax-neonfp16arith.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-16x1-minmax-neonfp16arith-pipelined.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-16x1-minmax-neonfp16arith-x2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-16x1-minmax-neonfp16arith.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-24x1-minmax-neonfp16arith-pipelined.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-24x1-minmax-neonfp16arith-x2.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-24x1-minmax-neonfp16arith.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-32x1-minmax-neonfp16arith-pipelined.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-32x1-minmax-neonfp16arith-x2.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-32x1-minmax-neonfp16arith.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-neonfp16arith-u16.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/ParamUtils.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-neonfp16arith-u16.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SoftMax.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vclamp/gen/f16-vclamp-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vclamp/gen/f16-vclamp-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-velu/gen/f16-velu-neonfp16arith-rr1-p3-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-velu/gen/f16-velu-neonfp16arith-rr1-p3-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vhswish/gen/f16-vhswish-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vhswish/gen/f16-vhswish-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vlrelu/gen/f16-vlrelu-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vlrelu/gen/f16-vlrelu-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vmulcaddc/gen/f16-vmulcaddc-c8-minmax-neonfp16arith-2x.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vmulcaddc/gen/f16-vmulcaddc-c16-minmax-neonfp16arith-2x.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndd-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndd-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndne-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndne-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndu-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndu-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndz-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndz-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u24.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBinaryOpIntersectionKernel.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u40.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u56.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u24.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u40.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u48.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u56.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u64.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-neonfp16arith-nr1fma1adj-u8.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-neonfp16arith-nr1fma1adj-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-neonfp16arith-nr1fma1adj-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u8.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u24.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u32.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBlas.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u40.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBlasImpl.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u48.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u56.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u64.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u72.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u80.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u8.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u24.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u40.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u48.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u56.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u64.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u72.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u80.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u8.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u16.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseCsrTensor.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u24.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u40.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u48.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u56.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u64.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u72.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u80.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vabs-neonfp16arith-u8.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vabs-neonfp16arith-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vneg-neonfp16arith-u8.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vneg-neonfp16arith-u16.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseFactories.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vsqr-neonfp16arith-u8.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vsqr-neonfp16arith-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-exp-neonfp16arith-rr2-p3.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expm1minus-neonfp16arith-rr1-p3.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expm1minus-neonfp16arith-rr2-p3.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr1-p2.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr1-p3.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr2-p2.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr2-p3.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p2-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p2-nr1recps.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p2-recpe.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p3-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p3-nr1recps.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p3-recpe.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-neonfp16arith-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-neonfp16arith-nr1fma.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseMatMul.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-neonfp16arith-nr1rsqrts.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1fma.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseTensor.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1fmaadj.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1recps.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1recpsadj.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-recpe.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-recpeadj.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fmaadj.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recpsadj.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpe.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x16-minmax-neonfp16arith-mlal-lane.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x16-minmax-neonfp16arith-mlal-lane.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x16-minmax-neonfp16arith-mlal-lane.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseTensorMath.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x16-minmax-neonfp16arith-mlal-lane.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x16-minmax-neonfp16arith-mlal-lane.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseUnaryOps.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c2s4-minmax-neonfp16arith.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c2s4-minmax-neonfp16arith.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c2s4-minmax-neonfp16arith-mlal.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c2s4-minmax-neonfp16arith-mlal.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u24.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-aarch64-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-aarch64-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-aarch64-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-aarch64-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-aarch64-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-aarch64-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u24.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u40.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u56.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u64.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/ValidateCompressedIndicesKernel.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-aarch64-neonfp16arith-sqrt-u8.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorAliases.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-aarch64-neonfp16arith-sqrt-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-aarch64-neonfp16arith-sqrt-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u24.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u40.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u56.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u72.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr1-p2-div.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr1-p3-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr2-p2-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr2-p3-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-aarch64-neonfp16arith-sqrt.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-aarch64-neonfp16arith-expm1minus-rr1-p3h1ts-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorBackward.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorBinaryOps.cpp.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorFactories.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorMath.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorMatmul.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x32c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x8c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16c4-minmax-fp32-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorUnaryOps.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x8c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorUtils.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16c4-minmax-fp32-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/AffineQuantizer.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-5x8c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-5x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x16c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/Copy.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-8x8c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/FakeQuantPerChannelAffine.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-8x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-5x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-5x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-8x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-8x16c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/QTensor.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorAdvancedIndexing.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 73%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorCompare.cpp.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-goi-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondotfp16arith-cortex-a55.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mull.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorFactories.cpp.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2-k-over-64.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2-k-over-2048.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-4.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-8.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-16.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-32.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-64.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-2048.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/vlog.c.o [ 77%] Built target microkernels-all [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AdaptiveAveragePooling.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AveragePool2d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AveragePool3d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/BinaryOps.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/ChannelShuffle.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/IntReprQuant.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/LinearUnpackImpl.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/MakePerTensorQuantizedTensor.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Normalization.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Pooling.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/ReduceOps.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/RuyUtils.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Sorting.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/TensorShape.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleBilinear2d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleNearest2d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleNearest3d.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/XnnpackUtils.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/fused_obs_fake_quant.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/init_qnnpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qdropout.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qgelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qhardsigmoid.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qhardswish.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmatmul.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmul.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qnormalization.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qrelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qsigmoid.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qsoftmax.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qtanh.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qthreshold.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/library.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/qconv_unpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/qlinear_unpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/LinearAlgebra.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SparseBlasImpl.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SpectralOps.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/BinaryOps.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Conv.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/ConvPrepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Copy.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Gelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/IDeepRegistration.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Linear.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MKLDNNCommon.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Matmul.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MkldnnTensorMath.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Normalization.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/OpContext.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Pooling.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Prelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/RNN.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Relu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/SoftMax.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/TensorFactories.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/TensorShape.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/UnaryOps.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Utils.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/attention.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/sdp_utils_cpp.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/transformer.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/utils/Factory.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Activation.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/AveragePooling.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/ChannelShuffle.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Convolution.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Init.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Linear.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/MaxPooling.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/OpContext.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Shim.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CompositeViewCopyKernels.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_0.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_1.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_3.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_4.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterBackendSelect.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeExplicitAutogradNonFunctional.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeImplicitAutograd.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeImplicitAutogradNestedTensor.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_0.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_1.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_2.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_3.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMkldnnCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterNestedTensorCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterNestedTensorMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterQuantizedCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterQuantizedMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCsrCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCsrMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterZeroTensor.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPU_add.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ATenOpList.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/TensorMethods.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/quantized/QTensorImpl.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/quantized/Quantizer.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_bind.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_model_loader.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_register.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_wrapper.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPUKernel_add.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/spherical_bessel_j0.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k1.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k0.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/layer_norm_kernel.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int8mm_kernel.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int4mm_kernel.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/group_norm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/batch_norm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/airy_ai.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/WeightNormKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Unfold2d.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/TensorCompareKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SumKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/StackKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SparseFactories.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SortingKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SoftMaxKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SampledAddmmKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RenormKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceAllOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PowKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PaddingKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/NativeMultiheadAttnKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MultinomialKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxUnpoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LinearAlgebraKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LerpKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/IndexKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/HistogramKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FlashAttentionKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FillKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistributionKernels.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DepthwiseConvKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CrossKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ComplexKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CatKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BlasKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AvgPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AmpGradScalerKernels.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveMaxPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveAvgPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Activation.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/vulkan/Context.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/metal/Context.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/core/common.cc.o [ 80%] Building C object caffe2/CMakeFiles/torch_cpu.dir/__/third_party/miniz-2.1.0/miniz.c.o /builddir/build/BUILD/pytorch/third_party/miniz-2.1.0/miniz.c:3157:9: note: ‘#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.’ 3157 | #pragma message("Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.") | ^~~~~~~ [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/inline_container.cc.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/istream_adapter.cc.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/file_adapter.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/crc.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/read_adapter_interface.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/string_utils.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/ThreadPool.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/pthreadpool-cpp.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/thread_pool_guard.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/proto_wrap.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/adagrad.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/batch_box_cox.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/embedding_lookup.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/embedding_lookup_idx.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/fused_8bit_rowwise_embedding_lookup.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/fused_8bit_rowwise_embedding_lookup_idx.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/fused_nbit_rowwise_conversion.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/lstm_unit_cpu_common.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/math_cpu_base.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/typed_axpy.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/Functions.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ViewFuncs.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_3.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_2.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_3.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_4.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/LazyNativeFunctions.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/RegisterLazy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/anomaly_mode.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd_meta.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd_not_implemented_fallback.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/cpp_hook.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/custom_function.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/engine.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/forward_grad.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/function.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/accumulate_grad.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/basic_ops.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/tensor.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/utils.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/input_buffer.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/input_metadata.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/jit_decomp_interface.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/profiler_kineto.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/profiler_legacy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/record_function_ops.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/saved_variable.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/utils/warnings.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/variable.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/variable_info.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner_cpu.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/shim_common.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/tensor_converter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/inductor_ops.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/function_impl.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/module.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/object.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_debug_handler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_debug_info.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_detail.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_interface.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_resolver.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/codegen.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/compiler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/executor.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/fallback.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/interface.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/kernel_cache.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/builtin_functions.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/canonicalize_modified_loop.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/convert_to_ssa.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/edit_distance.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/exit_transforms.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/inline_loop_condition.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/name_mangler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/parser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/schema_matching.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/script_type_parser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/sugared_value.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/tracer.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/versioned_symbols.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/alias_analysis.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/attributes.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/constants.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/graph_utils.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/irparser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/node_hashing.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/scope.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/subgraph_matcher.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/type_hashing.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/jit_log.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/jit_opt_limit.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/model_compatibility.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/runtime_compatibility.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/flatbuffer_loader.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/function.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/import.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/interpreter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/module.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/aot_compiler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/backend.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/context.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/registry.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/observer.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/parse_bytecode.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/parse_operators.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/prim_ops_registery.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/promoted_prim_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/quantization.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/register_ops_common_utils.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/type_parser.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/upgrader_mobile.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/upgraders.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/upgraders_entry.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/utils.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/version_map.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/add_if_then_else.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/annotate_warns.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/bailout_graph.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/batch_mm.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/canonicalize.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/canonicalize_graph_fuser_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/check_strict_fusion.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/clear_profiling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/clear_undefinedness.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/common_subexpression_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/concat_opt.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/constant_pooling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/constant_propagation.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/create_autodiff_subgraphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/create_functional_graphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dbr_quantization/remove_redundant_aliases.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dead_code_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/decompose_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/device_type_analysis.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dtype_analysis.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/eliminate_no_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/erase_number_types.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fixup_trace_scope_blocks.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fold_conv_bn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fold_linear_bn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/freeze_module.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_concat_linear.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_conv_add_relu_fusion.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_conv_folding.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_graph_optimizations.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_linear_folding.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_linear_transpose.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_ops_to_mkldnn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fuse_linear.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fuse_relu.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/graph_fuser.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/graph_rewrite_helper.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/guard_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/hoist_conv_packed_params.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_autodiff_subgraphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_fork_wait.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_forked_closures.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inliner.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inplace_check.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/insert_guards.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/integer_value_refinement.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lift_closures.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/liveness.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/loop_unrolling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_grad_of.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_tuples.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/metal_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/mkldnn_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/normalize_ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/pass_manager.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_alias_sensitive.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_dict_idioms.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_list_idioms.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_non_tensor.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/prepack_folding.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/dedup_module_uses.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/finalize.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/fusion_passes.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/helper.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/insert_observers.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/insert_quant_dequant.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/quantization_type.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/register_packed_params.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/refine_tuple_types.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_dropout.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_exceptions.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_expands.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_mutation.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_redundant_profiles.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/replacement_of_old_operators.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/requires_grad_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/restore_mutation.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/shape_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/specialize_autogradzero.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/subgraph_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_cache.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_runtime_fusion.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/tensorexpr_fuser.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/update_differentiable_graph_requires_grad.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/memory_dag.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/op_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/optimization_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/subgraph_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/value_refinement_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/variadic_ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/vulkan_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/xnnpack_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/python/update_graph_executor_opt.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/python/utf8_decoding_ignore.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/argument_spec.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/autodiff.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/decomposition_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/decomposition_registry_util.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/graph_executor.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/instruction.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter/frame.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter/preprocess_graph.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/jit_exception.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/jit_trace.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/logging.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/operator.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/print_handler.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/profiling_record.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_ops_utils.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/script_profile.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/serialized_shape_function_registry.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/simple_graph_executor_impl.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/slice_indices_adjust.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/fusion.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/generated_ops.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/impl.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/memory_planner.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/native_ops.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/ops.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/passes.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/te_wrapper.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_script.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_shape_registry.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_shape_registry_util.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/vararg_functions.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/callstack_debug_info_serialization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_export_helpers.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_read.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_source.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickle.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickler.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/python_print.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/source_range_serialization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/type_name_uniquer.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/unpickler.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/block_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/bounds_inference.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/bounds_overlap.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/cpp_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/expr.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_core.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_registry.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/graph_opt.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/hash_provider.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/intrinsic_symbols.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_cloner.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_mutator.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_printer.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_simplifier.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_verifier.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_visitor.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/kernel.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/llvm_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/llvm_jit.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/loopnest.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/loopnest_randomization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/lowerings.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/mem_dependency_checker.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/conv2d.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/matmul.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/misc.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/norm.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/pointwise.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/quantization.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/reduction.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/softmax.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/reduction.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/registerizer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/tensor.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/types.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/unique_name_manager.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/testing/file_check.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/testing/hooks_for_testing.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/backend_device.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/backend_interface.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/lowering_context.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/config.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/debug_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/hash.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/helpers.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_dump_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_metadata.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/lazy_graph_executor.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/metrics.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/multi_wait.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ops/arithmetic_ir_ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ops/utils.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/permutation_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/shape.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/shape_inference.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor_impl.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/thread_pool.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/trie.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/monitor/counters.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/monitor/events.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/collection.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/combined_traceback.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/data_flow.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/kineto_client_interface.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/kineto_shim.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/python_tracer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/vulkan.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/perf.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/execution_trace_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/itt_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/nvtx_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/stubs/base.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/unwind/unwind.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/unwind/unwind_fb.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/cpp_stacktraces.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/schema_info.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/tensor_flatten.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/variadic.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/cuda/interface.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/autocast.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_graph.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_inplace_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/check_alias_annotation.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_c10_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_prim_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_prim_ops_fulljit.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_special_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/debug_info.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/dynamic_ir.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/config.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ops/device_data.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ops/generic.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_autograd_functions.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_backend_impl.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_eager_fallback.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_lowering_context.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_native_functions.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_node.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_node_lowering.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/import_data.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/export_data.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/optim/sgd.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/random.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/sequential.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/flatbuffer_serializer.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/FunctionsManual.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/out_types.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/TraceTypeManual.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/VariableTypeManual.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/jit.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/backport.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/backport_manager.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/onnx.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export_bytecode.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export_module.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/cpu/fused_kernel.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/module_save.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/byte_order.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Backend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/FileStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Functional.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/GlooDeviceFactory.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/GroupRegistry.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ParamCommsUtils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/PrefixStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroup.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupWrapper.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Store.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStoreBackend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStoreLibUvBackend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Utils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/comm.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/debug.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/default_comm_hooks.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/logger.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/logging.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/quantization/quantization.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/reducer.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/sequence_num.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/socket.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Work.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/autograd.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/context/container.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/context/context.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/engine/dist_engine.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/functions/recvrpc_backward.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/functions/sendrpc_backward.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/autograd_metadata.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/propagate_gradients_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/propagate_gradients_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/cleanup_autograd_context_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/cleanup_autograd_context_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_autograd.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_profiling_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_profiling_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rref_backward_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rref_backward_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/HashStore.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupRoundRobin.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/agent_utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/message.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/profiler/remote_profiler_manager.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/profiler/server_process_global_profiler.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_remote_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/request_callback.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/request_callback_no_python.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rpc_agent.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_context.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_impl.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_proto.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_remote_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/tensorpipe_agent.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/tensorpipe_utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/testing/faulty_tensorpipe_agent.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/torchscript_functions.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/types.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/cuda.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/datasets/mnist.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/distributed.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/random.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/sequential.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/stream.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/enum.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/imethod.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/mps.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/init.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/module.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/_functions.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/activation.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/adaptive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/batchnorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/normalization.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/instancenorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/conv.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/dropout.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/distance.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/embedding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/fold.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/linear.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/loss.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/padding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/pixelshuffle.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/pooling.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/rnn.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/upsampling.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/transformer.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/container/functional.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/activation.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/adaptive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/batchnorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/embedding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/instancenorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/normalization.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/conv.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/dropout.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/linear.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/padding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/pooling.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/rnn.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/vision.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/transformer.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adagrad.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adam.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adamw.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/lbfgs.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/optimizer.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/rmsprop.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/serialize.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/sgd.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/lr_scheduler.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/step_lr.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/reduce_on_plateau_scheduler.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize/input-archive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize/output-archive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/xpu.cpp.o [ 89%] Linking CXX shared library ../lib/libtorch_cpu.so Warning: Unused direct dependencies: libc10.so.2.4 /lib64/libqnnpack.so.1 /lib64/libgloo_cuda.so.1 /lib64/liblmdb.so.0.0.0 /lib64/libleveldb.so.1 /lib64/libsnappy.so.1 /lib64/libzmq.so.5 /lib64/libhiredis.so.1.0.0 /lib64/libopencv_highgui.so.409 /lib64/libopencv_optflow.so.409 /lib64/libopencv_videoio.so.409 /lib64/libonnx_optimizer.so /lib64/libfoxi_loader.so.1 /lib64/libsleef.so.3 /lib64/libopencv_ximgproc.so.409 /lib64/libopencv_imgcodecs.so.409 /lib64/libopencv_video.so.409 /lib64/libopencv_dnn.so.409 /lib64/libopencv_calib3d.so.409 /lib64/libopencv_features2d.so.409 /lib64/libopencv_imgproc.so.409 /lib64/libopencv_flann.so.409 /lib64/libopencv_core.so.409 /lib64/libopencv_cudev.so.409 /usr/local/cuda-12.3/lib64/libcudart.so.12 [ 89%] Built target torch_cpu [ 89%] Building CXX object caffe2/torch/lib/libshm/CMakeFiles/shm.dir/core.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAGraph.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAContext.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAGeneratorImpl.cpp.o [ 89%] Linking CXX shared library ../../../../lib/libshm.so Warning: Unused direct dependencies: libtorch_cpu.so.2.4 /lib64/libprotobuf.so.32 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libqnnpack.so.1 /lib64/libgloo.so.1 /lib64/libgloo_cuda.so.1 /lib64/libm.so.6 [ 89%] Built target shm [ 89%] Building CXX object caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o [ 90%] Linking CXX executable ../../../../bin/torch_shm_manager [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDASparseDescriptors.cpp.o Warning: Unused direct dependencies: libshm.so.2.4 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libm.so.6 [ 90%] Built target torch_shm_manager [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CachingHostAllocator.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CuSparseHandlePool.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/EmptyTensor.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/Exceptions.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/PeerToPeerAccess.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/PinnedMemoryAllocator.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/CUDAHooks.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/LazyNVRTC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/llvm_basic.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/llvm_complex.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Resize.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SpectralOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorCompare.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/BatchNorm.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/ConvPlaceholders.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/ConvShared.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/Conv_v7.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/Conv_v8.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/GridSampler.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/LossCTC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/MHA.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/RNN.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/BatchNorm_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/Conv_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/RNN_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/Activation.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Conv.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/ConvPrepack.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/ConvUnpackImpl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Linear.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/LinearPrepack.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/LinearUnpackImpl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Pooling.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/AutocastRNN.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Descriptors.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Handle.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Types.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/nccl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/reducer_cuda.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/NCCLUtils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/UCCTracing.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/intra_node_comm.cpp.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/intra_node_comm.cu.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/rpc/tensorpipe_cuda.cpp.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorFactories.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/Sleep.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub-RadixSortKeys.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub-RadixSortPairs.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/IndexUtils.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/jiterator.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AbsKernel.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationEluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationGeluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationGluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardshrinkKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardsigmoidKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardswishKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardtanhKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationLeakyReluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationLogSigmoidKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationMishKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationPreluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSiluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSoftplusKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSoftshrinkKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationThresholdKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveMaxPooling3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AmpKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AveragePool2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AveragePool3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryBitwiseOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivFloorKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivTrueKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivTruncKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryGeometricKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryLogicalOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMiscOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMulKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryRemainderKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryShiftOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Bucketization.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CUDAScalar.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Col2Im.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CompareEQKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CompareKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ComplexKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ConvolutionMM2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Copy.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CopysignKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CrossKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumminmaxKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumprodKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumsumKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DepthwiseConv2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DepthwiseConv3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistanceKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionBernoulli.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionCauchyKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionExponentialKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionGeometricKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionLogNormalKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionNormal.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionRandomKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionUniform.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Distributions.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Dropout.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Embedding.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/EmbeddingBag.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FillKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FlattenIndicesKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalarTensor.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachReduceOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachTernaryOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachUnaryOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FractionalMaxPool3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FunctionOfAMatrixUtilsKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedAdamKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedAdamWKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedSgdKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GcdLcmKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GridSampler.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IGammaKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Im2Col.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IndexKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Indexing.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Lerp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LinearAlgebra.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogAddExpKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Loss.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LossCTC.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MaxUnpooling.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MixedDtypesLinear.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultiLabelMarginCriterion.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultiMarginLoss.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultinomialKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NLLLoss2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveDilatedConvolution.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Nonzero.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Normalization.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/PointwiseOpsKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/PowKernel.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RNN.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Randperm.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RangeFactories.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RecordStream.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Reduce.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceAMinMaxKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceArgMaxKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceArgMinKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceLogicKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMaxValuesKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMinValuesKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMomentKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceNormKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceSumProdKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReflectionPad.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RenormKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Repeat.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReplicationPadding.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RreluWithNoise.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ScatterGatherKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SegmentReduce.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Shape.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SoftMax.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sort.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SortImpl.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SortStable.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu: In instantiation of ‘at::Tensor at::native::_GLOBAL__N__08542f1a_10_SoftMax_cu_9f978f63::host_softmax(const at::Tensor&, int64_t, bool, const at::Tensor&) [with Epilogue = LogSoftMaxForwardEpilogue; bool is_log_softmax = true; int64_t = long int]’: /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:1072:56: required from here /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844:2132: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu: In instantiation of ‘at::Tensor at::native::_GLOBAL__N__08542f1a_10_SoftMax_cu_9f978f63::host_softmax(const at::Tensor&, int64_t, bool, const at::Tensor&) [with Epilogue = SoftMaxForwardEpilogue; bool is_log_softmax = false; int64_t = long int]’: /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:1096:54: required from here /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844:2132: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sorting.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SparseBinaryOpIntersectionKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SparseMM.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SpectralOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/StepKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SummaryOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorCompare.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorModeKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorShape.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTopK.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTransformations.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TriangularOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryComplexKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryFractionKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGammaKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAcosKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAcoshKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAsinKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAsinhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAtanKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAtanhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricCosKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricCoshKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricSinKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricSinhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricTanKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricTanhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryLogKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryOpsKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnarySignKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UniqueCub.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleLinear1d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest1d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest3d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleTrilinear3d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ValidateCompressedIndicesKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/WeightNorm.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ZetaKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/airy_ai.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_j0.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_j1.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_y0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_y1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_t.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_u.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_v.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_w.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adam_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adamw_amsgrad_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adamw_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/group_norm_kernel.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/hermite_polynomial_h.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/hermite_polynomial_he.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/int4mm.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/laguerre_polynomial_l.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/layer_norm_kernel.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/legendre_polynomial_p.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_i0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_i1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_k0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_k1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/scaled_modified_bessel_k0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/scaled_modified_bessel_k1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_t.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_u.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_v.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_w.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/spherical_bessel_j0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorBinaryOps.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseMatMul.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredLinear.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredOps.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/Activation.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/AffineQuantizer.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/EmbeddingBag.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/FakeQuantizeCore.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/IntReprQuant.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/MakePerTensorQuantizedTensor.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/attention.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/attention_backward.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim160_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim160_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim192_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim192_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim224_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim224_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim256_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim256_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim32_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim32_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim64_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim64_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim128_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim128_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim160_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim160_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim192_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim192_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim224_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim224_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim256_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim256_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim32_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim32_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim64_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim64_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim128_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim128_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim160_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim160_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim192_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim192_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim224_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim224_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim256_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim256_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim32_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim32_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim64_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim64_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k96.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k96.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k128.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k128_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k32.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k32_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k64.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k64_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k65536.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k65536_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k128.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k128_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k32.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k32_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k64.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k64_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k65536.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k65536_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_bf16_aligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f16_aligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f16_notaligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f32_aligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f32_notaligned.cu.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterNestedTensorCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterQuantizedCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterSparseCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterSparseCsrCUDA.cpp.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/UfuncCUDA_add.cu.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDABlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDASparseBlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CublasHandlePool.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/tunable/StreamTimer.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/tunable/Tunable.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Activation.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Blas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Distributions.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Equal.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GridSampler.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IndexKernel.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceOps.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ScanKernels.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sort.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sorting.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorModeKernel.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorShapeCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTopK.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/jit_utils.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlasImpl.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlasLegacy.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/CudaIPCTypes.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/comm.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/memory_snapshot.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner_cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_torch/shim_cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/codegen/fuser/cuda/fused_kernel.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/profiler/stubs/cuda.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/autograd/functions/comm.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/passes/frozen_conv_add_relu_fusion_cuda.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/tensorexpr/cuda_codegen.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/runtime/register_cuda_ops.cpp.o [ 98%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Unique.cu.o [ 98%] Linking CXX shared library ../lib/libtorch_cuda.so Warning: Unused direct dependencies: libc10_cuda.so /lib64/libgloo_cuda.so.1 /usr/local/cuda-12.3/lib64/libcurand.so.10 libc10.so.2.4 /lib64/libgflags.so.2.2 libtorch_cpu.so.2.4 [ 98%] Built target torch_cuda [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLibBlas.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch.dir/__/empty.cpp.o [ 98%] Linking CXX shared library ../lib/libtorch.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 libtorch_cpu.so.2.4 libtorch_cuda.so [ 98%] Built target torch [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/CUDASolver.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/CusolverDnHandlePool.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_1.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_0.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_2.cpp.o [ 98%] Linking CXX shared library ../lib/libtorch_cuda_linalg.so Warning: Unused direct dependencies: libtorch_cpu.so.2.4 libtorch_cuda.so libc10_cuda.so /usr/local/cuda-12.3/lib64/libnvToolsExt.so.1 /lib64/libprotobuf.so.32 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libqnnpack.so.1 /lib64/libgloo.so.1 /lib64/libgloo_cuda.so.1 [ 98%] Built target torch_cuda_linalg [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_3.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_4.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_variable_methods.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_0.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_1.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_2.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_nn_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_fft_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_linalg_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_nested_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_sparse_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_special_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_return_types.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_enum_tag.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/DataLoader.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Device.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Dtype.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/DynamicTypes.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Generator.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Layout.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/MemoryFormat.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/QScheme.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Module.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/PyInterpreter.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/python_dimname.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Size.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Storage.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/StorageMethods.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/StorageSharing.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Stream.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/TypeInfo.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/api/src/python/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/functions/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/profiler_python.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_anomaly_mode.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_saved_variable_hooks.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_cpp_function.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_engine.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_function.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_hook.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_legacy_variable.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_nested_functions_manual.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_torch_functions_manual.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable_indexing.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/python_compiled_autograd.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cache_entry.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cpp_shim.cpp.o [ 99%] Building C object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cpython_defs.c.o [ 99%] Building C object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/eval_frame.c.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/extra_state.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/guards.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/functorch/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/mps/Module.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/inductor/aoti_runner/pybind.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/backends/backend_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/cast_all_constant_to_floating.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/deduplicate_initializers.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/eval_peephole.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/constant_fold.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/constant_map.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/eliminate_unused_items.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/fixup_onnx_controlflow.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/list_model_parameters.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/function_substitution.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/helper.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/peephole.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/preprocess_for_onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/prepare_division_for_onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/scalar_type_analysis.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/unpack_quantized_weights.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/shape_type_inference.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/function_extraction.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/onnx_log.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/naming.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/pybind_utils.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/autograd_function_process.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/common.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/pattern_encapsulation.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/pattern_conversion.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_arg_flatten.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_custom_class.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_dict.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_interpreter.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_ir.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_list.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tracer.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/script_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/frontend/concrete_module_type.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/frontend/tree_views.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_sugared_value.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tree_views.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/runtime/static/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/tensorexpr/tensorexpr_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/monitor/python_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/multiprocessing/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/onnx/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/profiler/python/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/profiler/python/combined_traceback.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/serialization.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/tensor/python_tensor.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/throughput_benchmark.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/device_lazy_init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/invalid_arguments.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/nested.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/object_ptr.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_arg_parser.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_dispatch.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_symnode.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/pybind.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/pyobject_preservation.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/structseq.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_apply.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_dtypes.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_layouts.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_memoryformats.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_qschemes.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_list.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_new.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_numpy.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_types.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/disable_torch_function.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/verbose.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cpu/Module.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/lazy/python/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/lazy/python/python_util.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Event.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Module.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/python_comm.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Stream.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Graph.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/cudart.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/nvtx.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/utils.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/CUDAPluggableAllocator.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/cudnn.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/c10d/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/c10d/python_comm_hook.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/autograd/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/py_rref.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/python_functions.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/python_rpc_handler.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/request_callback_impl.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/testing/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/unpickled_python_call.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/unpickled_python_remote_call.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/runtime/register_distributed_ops.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/python_nccl.cpp.o [100%] Linking CXX shared library ../../lib/libtorch_python.so Warning: Unused direct dependencies: libshm.so.2.4 libtorch.so.2.4 libtorch_cpu.so.2.4 libtorch_cuda.so libc10_cuda.so libc10.so.2.4 [100%] Built target torch_python [100%] Building C object caffe2/torch/CMakeFiles/_C.dir/csrc/stub.c.o [100%] Building CXX object functorch/CMakeFiles/functorch.dir/csrc/dim/dim.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/nnapi_backend.dir/csrc/jit/backends/nnapi/nnapi_backend_lib.cpp.o [100%] Building C object functorch/CMakeFiles/functorch.dir/csrc/dim/dim_opcode.c.o [100%] Building CXX object functorch/CMakeFiles/functorch.dir/csrc/init_dim_only.cpp.o [100%] Linking C shared library ../../lib/_C.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 libtorch_python.so.2.4 [100%] Built target _C [100%] Building CXX object caffe2/torch/CMakeFiles/nnapi_backend.dir/csrc/jit/backends/nnapi/nnapi_backend_preprocess.cpp.o [100%] Linking CXX shared library ../../lib/libnnapi_backend.so Warning: Unused direct dependencies: libtorch.so.2.4 libtorch_python.so.2.4 libtorch_cpu.so.2.4 libtorch_cuda.so libc10.so.2.4 [100%] Built target nnapi_backend [100%] Linking CXX shared module functorch.so [100%] Built target functorch + popd ~/build/BUILD/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.0SPDnA + umask 022 + cd /builddir/build/BUILD + '[' /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 '!=' / ']' + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 ++ dirname /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 + mkdir -p /builddir/build/BUILDROOT + mkdir /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS ~/build/BUILD/pytorch/build ~/build/BUILD/pytorch + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -Wl,-lstdc++' + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + cd pytorch + pushd build + export PYTHON_EXECUTABLE=/usr/bin/python3 + PYTHON_EXECUTABLE=/usr/bin/python3 + make install DESTDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 [ 0%] Built target clog [ 0%] Built target fp16 [ 1%] Built target pytorch_qnnpack [ 1%] Built target fxdiv [ 1%] Built target psimd [ 63%] Built target microkernels-all [ 67%] Built target microkernels-prod [ 67%] Built target logging [ 67%] Built target hardware-config [ 67%] Built target indirection [ 68%] Built target jit [ 68%] Built target microparams-init [ 68%] Built target normalization [ 68%] Built target packing [ 68%] Built target allocator [ 68%] Built target memory [ 68%] Built target cache [ 68%] Built target microkernel-utils [ 68%] Built target mutex [ 68%] Built target post-operation [ 68%] Built target operator-utils [ 69%] Built target operators [ 69%] Built target operator-run [ 70%] Built target subgraph [ 70%] Built target convolution-test-helpers [ 70%] Built target XNNPACK [ 70%] Built target fmt [ 72%] Built target c10 [ 72%] Built target c10_cuda [ 72%] Built target Caffe2_PROTO [ 72%] Built target caffe2_protos [ 72%] Built target caffe2_nvrtc [ 72%] Built target ATEN_CPU_FILES_GEN_TARGET [ 89%] Built target torch_cpu [ 89%] Built target ATEN_CUDA_FILES_GEN_TARGET [ 97%] Built target torch_cuda [ 97%] Built target torch [ 97%] Built target torch_cuda_linalg [ 97%] Built target torch_global_deps [ 97%] Built target python_copy_files [ 97%] Built target shm [ 97%] Built target generate-torch-sources [ 97%] Built target torch_python_stubs [ 97%] Built target gen_torch_version [ 99%] Built target torch_python [ 99%] Built target _C [ 99%] Built target nnapi_backend [100%] Built target torch_shm_manager [100%] Built target functorch Install the project... -- Install configuration: "Release" + mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64 + find /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/ -name '*.a' -type f -prune -exec rm -rf '{}' + + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/python3.12 + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libc10.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libc10.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libc10.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libc10_cuda.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libcaffe2_nvrtc.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libshm.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libshm.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libshm.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_cpu.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_cpu.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_cpu.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_cuda.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_cuda_linalg.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_global_deps.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_global_deps.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_global_deps.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_python.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_python.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib/libtorch_python.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/ + popd ~/build/BUILD/pytorch + install -D -pm 755 build/lib/libnnapi_backend.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/ + mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/torch/bin + install -D -pm 644 build/lib/_C.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/torch/ + install -D -pm 644 build/functorch/functorch.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/functorch/_C.so + install -D -pm 644 aten/src/THC/THCDeviceUtils.cuh /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/include/THC/ + ln -sf /usr/include /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/torch/include + ln -sf /usr/lib64 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/torch/lib + ln -sf /usr/bin/torch_shm_manager /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager ++ find ./torch/ -name '*.py' + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/streams.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/streams.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/_gpu_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/_gpu_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/weak.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/weak.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/viz/_cycles.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/viz/_cycles.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/viz/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/viz/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/throughput_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/throughput_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/writer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/writer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/summary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/summary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_pytorch_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_pytorch_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_proto_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_proto_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_onnx_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_onnx_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_embedding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_embedding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_convert_np.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_convert_np.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_caffe2_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_caffe2_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/show_pickle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/show_pickle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_zoo.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/model_zoo.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_dump/__main__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/model_dump/__main__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_dump/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/model_dump/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/mobile_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/mobile_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/mkldnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/mkldnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/jit/log_extract.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/jit/log_extract.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/hipify_python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/hipify_python.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/cuda_to_hip_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/cuda_to_hip_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/flop_counter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/flop_counter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/file_baton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/file_baton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/dlpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/dlpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/deterministic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/deterministic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/sampler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/sampler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/graph_settings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/graph_settings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/dataset.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/dataset.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/snapshot.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/snapshot.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/decoder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/decoder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/grouping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/grouping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/combining.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/combining.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/combinatorics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/combinatorics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/callable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/callable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/streamreader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/streamreader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/sharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/sharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/selecting.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/selecting.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/routeddecoder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/routeddecoder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/grouping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/grouping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/fileopener.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/fileopener.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/filelister.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/filelister.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/combining.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/combining.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/combinatorics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/combinatorics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/callable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/callable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/gen_pyi.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/gen_pyi.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/datapipe.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/datapipe.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/structures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/structures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/datapipes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/datapipes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/dataframes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/dataframes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/dataframe_wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/dataframe_wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_typing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_typing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_hook_iterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_hook_iterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/dataloader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/dataloader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/backward_compatibility.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/backward_compatibility.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/worker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/worker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/signal_handling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/signal_handling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/pin_memory.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/pin_memory.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/fetch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/fetch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/collate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/collate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/cpp_extension.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/cpp_extension.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/cpp_backtrace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/cpp_backtrace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/collect_env.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/collect_env.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/checkpoint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/checkpoint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bundled_inputs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/bundled_inputs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bottleneck/__main__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/bottleneck/__main__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bottleneck/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/bottleneck/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/valgrind_wrapper/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/valgrind_wrapper/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/sparse_fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/sparse_fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/cpp_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/cpp_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/compile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/compile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/_stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/_stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/spectral.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/spectral.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/sparse_unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/sparse_unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/sparse_binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/sparse_binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/spectral_ops_fuzz_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/spectral_ops_fuzz_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/op_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/op_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/simple_timeit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/simple_timeit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/op_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/op_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/blas_compare_setup.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/blas_compare_setup.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/backend_registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/backend_registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/backcompat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/backcompat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_zip.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_zip.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_typing_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_typing_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_triton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_triton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_traceback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_traceback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/value_ranges.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/value_ranges.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/solve.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/solve.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/singleton_int.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/singleton_int.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/interp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/interp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_stats.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_stats.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_python_dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_python_dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_mode_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_mode_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_import_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_import_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_freeze.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_freeze.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_foreach_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_foreach_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_exposed_in.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_exposed_in.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_device.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_device.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_cxx_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_cxx_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_cpp_extension_versioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_cpp_extension_versioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_contextlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_contextlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_content_store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_content_store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_config_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_config_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/torch_version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/torch_version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/two_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/two_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/triton_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/triton_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/torchbind_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/torchbind_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/no_future_div.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/no_future_div.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/future_div.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/future_div.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/static_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/static_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/quantization_torch_package_models.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/quantization_torch_package_models.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/make_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/make_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/generate_tests.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/generate_tests.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/fake_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/fake_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/autograd_registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/autograd_registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/aot_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/aot_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/refs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/refs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/special.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/special.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/signal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/signal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/linalg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/linalg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/_masked.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/_masked.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/logging_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/logging_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/logging_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/logging_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/jit_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/jit_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/jit_metaprogramming_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/jit_metaprogramming_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/inductor_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/inductor_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/hypothesis_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/hypothesis_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/hop_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/hop_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/generated/annotated_fn_args.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/generated/annotated_fn_args.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/generated/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/generated/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/dynamo_test_failures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/dynamo_test_failures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/rpc_test_faulty.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/rpc_test_faulty.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/faulty_rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/faulty_rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/faulty_agent_rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/faulty_agent_rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/reinforcement_learning_rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/reinforcement_learning_rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/parameter_server_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/parameter_server_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/dist_optimizer_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/dist_optimizer_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/pipeline/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/pipeline/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/pipe_with_ddp_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/pipe_with_ddp_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/api/remote_module_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/api/remote_module_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/api/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/multi_threaded_pg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/multi_threaded_pg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/fake_pg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/fake_pg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/distributed_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/distributed_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/distributed_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/distributed_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/ddp_under_dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/ddp_under_dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/common_state_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/common_state_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/checkpoint_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/checkpoint_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_tensor/common_dtensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_tensor/common_dtensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/test_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/test_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_ops_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_ops_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/dist_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/dist_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/network2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/network2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/network1.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/network1.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/custom_op_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/custom_op_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/composite_compliance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/composite_compliance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_subclass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_subclass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_quantized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_quantized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_pruning.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_pruning.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_optimizers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_optimizers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_nn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_nn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_mkldnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_mkldnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_methods_invocations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_methods_invocations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_fsdp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_fsdp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_dtype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_dtype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_dist_composable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_dist_composable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_device_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_device_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_cuda.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_cuda.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/codegen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/codegen/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/check_kernel_launches.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/check_kernel_launches.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/autograd_function_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/autograd_function_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/autocast_test_lists.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/autocast_test_lists.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_creation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_creation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_comparison.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_comparison.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/storage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/storage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/special/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/special/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/semi_structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/semi_structured.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_triton_ops_meta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_triton_ops_meta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_triton_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_triton_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_semi_structured_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_semi_structured_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_semi_structured_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_semi_structured_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/windows/windows.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/signal/windows/windows.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/windows/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/signal/windows/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/signal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/return_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/return_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quasirandom.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quasirandom.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantization_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantization_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quant_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quant_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/quantization_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/quantization_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/quantization_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/quantization_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/fusion_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/fusion_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/fuse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/fuse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fuser_method_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fuser_method_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fuse_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fuse_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_quantized_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/_quantized_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_numeric_suite_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/_numeric_suite_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_numeric_suite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/_numeric_suite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/python_tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/python_tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/itt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/itt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_pattern_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/_pattern_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_memory_profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/_memory_profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/package_importer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/package_importer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/package_exporter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/package_exporter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/importer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/importer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/glob_group.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/glob_group.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/find_file_dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/find_file_dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/file_structure_representation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/file_structure_representation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/trace_dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/trace_dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/is_from_package.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/is_from_package.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/find_first_use_of_broken_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/find_first_use_of_broken_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_stdlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_stdlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_package_unpickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_package_unpickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_package_pickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_package_pickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_mock.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_mock.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_mangling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_mangling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_importlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_importlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_directory_reader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_directory_reader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_digraph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_digraph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/package/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/swa_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/swa_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/sparse_adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/sparse_adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/sgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/sgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/rprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/rprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/rmsprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/rmsprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/radam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/radam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/nadam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/nadam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/lr_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/lr_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/lbfgs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/lbfgs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/asgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/asgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adamw.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adamw.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adamax.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adamax.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adagrad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adagrad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adadelta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adadelta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/_multi_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/_multi_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/_functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/_functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/verification.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/verification.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset9.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset9.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset8.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset8.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset7.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset7.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset20.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset20.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset19.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset19.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset18.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset18.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset17.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset17.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset16.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset16.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset15.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset15.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset14.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset14.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset13.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset13.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset12.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset12.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset11.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset11.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset10.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset10.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_helper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_helper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_caffe2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_caffe2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/operators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/operators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/errors.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/errors.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_type_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_type_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_onnx_supported_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_onnx_supported_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/onnxruntime.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/onnxruntime.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/onnx_proto_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/onnx_proto_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/jit_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/jit_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/io_adapter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/io_adapter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/type_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/type_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/torch_export_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/torch_export_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/patcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/patcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/virtualization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/virtualization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/type_promotion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/type_promotion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/readability.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/readability.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/modularization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/modularization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/functionalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/functionalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/decomp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/decomp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/op_validation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/op_validation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/onnxfunction_dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/onnxfunction_dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/fx_symbolic_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/fx_symbolic_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/fx_onnx_interpreter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/fx_onnx_interpreter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/dynamo_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/dynamo_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/diagnostics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/diagnostics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/decomposition_table.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/decomposition_table.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/decomposition_skip.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/decomposition_skip.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/analysis/unsupported_nodes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/analysis/unsupported_nodes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/analysis/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/analysis/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/exporter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/exporter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_web_response.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_web_response.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_web_request.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_web_request.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_version_control_details.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_version_control_details.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_translation_metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_translation_metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_suppression.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_suppression.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_stack_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_stack_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_stack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_stack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_special_locations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_special_locations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_sarif_log.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_sarif_log.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_run_automation_details.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_run_automation_details.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_run.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_run.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_result_provenance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_result_provenance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_result.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_result.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_relationship.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_relationship.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_configuration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_configuration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_replacement.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_replacement.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_region.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_region.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_rectangle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_rectangle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_property_bag.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_property_bag.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_physical_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_physical_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_notification.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_notification.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_multiformat_message_string.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_multiformat_message_string.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_message.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_message.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_logical_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_logical_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_location_relationship.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_location_relationship.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_invocation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_invocation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_graph_traversal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_graph_traversal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_fix.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_fix.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_references.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_references.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_properties.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_properties.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_exception.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_exception.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_edge_traversal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_edge_traversal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_edge.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_edge.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_conversion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_conversion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_configuration_override.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_configuration_override.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_code_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_code_flow.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_attachment.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_attachment.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_content.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_content.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_change.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_change.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_address.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_address.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/formatter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/formatter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/_infra.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/_infra.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/_diagnostic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/_diagnostic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/_beartype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/_beartype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_globals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_globals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_exporter_states.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_exporter_states.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_experimental.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_experimental.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_deprecation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_deprecation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/weight_norm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/weight_norm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/stateless.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/stateless.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/spectral_norm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/spectral_norm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/prune.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/prune.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/parametrize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/parametrize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/parametrizations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/parametrizations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/memory_format.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/memory_format.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/convert_parameters.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/convert_parameters.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/clip_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/clip_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_per_sample_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_per_sample_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_named_member_accessor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_named_member_accessor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/linear_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/linear_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/layer_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/layer_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/group_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/group_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/expanded_weights_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/expanded_weights_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/expanded_weights_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/expanded_weights_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/embedding_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/embedding_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/conv_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/conv_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/conv_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/conv_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_deprecation_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_deprecation_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/functional_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/functional_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parameter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parameter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/scatter_gather.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/scatter_gather.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/replicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/replicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/parallel_apply.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/parallel_apply.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/comm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/comm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/upsampling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/upsampling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/transformer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/transformer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/pooling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/pooling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/pixelshuffle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/pixelshuffle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/padding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/padding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/loss.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/loss.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/lazy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/instancenorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/instancenorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/fold.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/fold.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/flatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/flatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/distance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/distance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/container.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/container.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/channelshuffle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/channelshuffle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/adaptive.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/adaptive.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/conv_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/conv_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/bn_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/bn_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/linear_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/linear_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/conv_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/conv_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/modules/fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/modules/fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/cpp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/common_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/common_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/backends/thnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/backends/thnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/bias.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/bias.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/_templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/_templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/_reduction.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/_reduction.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/sdpa.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/sdpa.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/nested_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/nested_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/spawn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/spawn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/reductions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/reductions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/queue.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/queue.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/pool.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/pool.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/_atfork.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/_atfork.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/mps/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/event.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/mps/event.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/mps/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/monitor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/monitor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/reductions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/reductions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/passthrough.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/passthrough.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/creation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/creation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/_ops_refs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/_ops_refs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/linalg/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/linalg/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/library.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/library.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/unsupported_tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/unsupported_tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/supported_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/supported_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/quantized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/quantized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/mobile/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/mobile/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/generate_bytecode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/generate_bytecode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/frontend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/frontend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/annotations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/annotations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_shape_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_shape_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_script.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_script.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_recursive.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_recursive.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_pickle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_pickle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_passes/_property_propagation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_passes/_property_propagation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_monkeytype_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_monkeytype_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_ir_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_ir_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_fuser.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_fuser.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_freeze.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_freeze.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_decompositions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_decomposition_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_decomposition_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_dataclass_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_dataclass_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_check.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_check.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_builtins.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_builtins.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_await.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_await.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_async.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_async.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/hub.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/hub.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/traceback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/traceback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/tensor_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/tensor_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/subgraph_rewriter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/subgraph_rewriter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/proxy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/proxy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/source_matcher_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/source_matcher_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/matcher_with_name_node_map_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/matcher_with_name_node_map_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/matcher_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/matcher_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/fuser_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/fuser_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tools_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tools_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tests/test_pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tests/test_pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tests/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tests/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/splitter_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/splitter_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/split_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/split_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/split_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/split_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/shape_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/shape_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/reinplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/reinplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/param_fetch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/param_fetch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/operator_support.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/operator_support.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/net_min_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/net_min_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/pass_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/pass_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/partitioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/partitioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/graph_manipulation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/graph_manipulation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/graph_drawer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/graph_drawer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/fake_tensor_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/fake_tensor_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/common/cse_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/common/cse_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/common/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/common/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/backends/cudagraphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/backends/cudagraphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/annotate_getitem_nodes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/annotate_getitem_nodes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/operator_schemas.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/operator_schemas.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/interpreter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/interpreter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/immutable_collections.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/immutable_collections.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/validator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/validator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unify_refinements.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unify_refinements.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/unification_tools.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/unification_tools.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/variadic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/variadic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/conflict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/conflict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/more.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/more.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/match.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/match.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/symbolic_shapes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/symbolic_shapes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/sym_node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/sym_node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/shape_inference/infer_symbol_values.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/shape_inference/infer_symbol_values.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/shape_inference/infer_shape.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/shape_inference/infer_shape.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/schema_type_annotation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/schema_type_annotation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/rewriter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/rewriter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/refinement_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/refinement_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/recording.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/recording.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/proxy_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/proxy_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/partitioner_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/partitioner_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/optimization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/optimization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/normalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/normalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/z3_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/z3_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/transform_to_z3.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/transform_to_z3.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/operation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/operation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint_generator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint_generator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/meta_tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/meta_tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/merge_matmul.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/merge_matmul.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/graph_gradual_typechecker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/graph_gradual_typechecker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/const_fold.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/const_fold.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/accelerator_partitioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/accelerator_partitioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_sym_dispatch_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_sym_dispatch_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_backward_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_backward_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/annotate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/annotate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_symbolic_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_symbolic_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_lazy_graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_lazy_graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_compatibility.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_compatibility.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/futures/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/futures/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/func/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/func/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fft/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/fft/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/unflatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/unflatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/graph_signature.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/graph_signature.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/exported_program.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/exported_program.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/dynamic_shapes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/dynamic_shapes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/custom_obj.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/custom_obj.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_unlift.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_unlift.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_tree_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_tree_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_safeguard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_safeguard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_remove_effect_tokens_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_remove_effect_tokens_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_remove_auto_functionalized_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_remove_auto_functionalized_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/export/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/wishart.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/wishart.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/weibull.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/weibull.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/von_mises.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/von_mises.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/uniform.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/uniform.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/transformed_distribution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/transformed_distribution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/studentT.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/studentT.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/relaxed_categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/relaxed_categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/relaxed_bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/relaxed_bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/poisson.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/poisson.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/pareto.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/pareto.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/one_hot_categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/one_hot_categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/negative_binomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/negative_binomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/multivariate_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/multivariate_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/multinomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/multinomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/mixture_same_family.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/mixture_same_family.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/lowrank_multivariate_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/lowrank_multivariate_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/logistic_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/logistic_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/log_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/log_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/lkj_cholesky.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/lkj_cholesky.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/laplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/laplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/kumaraswamy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/kumaraswamy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/kl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/kl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/inverse_gamma.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/inverse_gamma.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/independent.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/independent.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/half_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/half_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/half_cauchy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/half_cauchy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/gumbel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/gumbel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/geometric.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/geometric.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/gamma.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/gamma.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/fishersnedecor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/fishersnedecor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/exponential.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/exponential.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/exp_family.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/exp_family.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/distribution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/distribution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/dirichlet.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/dirichlet.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/continuous_bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/continuous_bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/constraints.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/constraints.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/constraint_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/constraint_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/chi2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/chi2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/cauchy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/cauchy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/binomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/binomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/beta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/beta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/style.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/style.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/loss.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/loss.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/input_reshard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/input_reshard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/fsdp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/fsdp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/ddp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/ddp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/_data_parallel_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/_data_parallel_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/run.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/run.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/server_process_global_profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/server_process_global_profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/rref_proxy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/rref_proxy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/options.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/options.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/backend_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/backend_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_testing/faulty_agent_backend_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_testing/faulty_agent_backend_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/remote_device.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/remote_device.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/worker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/worker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/stream.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/stream.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/tracker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/tracker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/skippable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/skippable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/portal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/portal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/namespace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/namespace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/layout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/layout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/pipeline.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/pipeline.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/pipe.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/pipe.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/phony.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/phony.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/microbatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/microbatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/dependency.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/dependency.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/copy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/copy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/checkpoint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/checkpoint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/profile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/profile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/blockpartition.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/blockpartition.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/zero_redundancy_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/zero_redundancy_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/post_localSGD_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/post_localSGD_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/named_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/named_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_sgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_sgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_rprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_rprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_rmsprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_rmsprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adamw.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adamw.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adamax.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adamax.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adagrad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adagrad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adadelta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adadelta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/apply_optimizer_in_backward.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/apply_optimizer_in_backward.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/templates/remote_module_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/templates/remote_module_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/templates/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/templates/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/instantiator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/instantiator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/api/remote_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/api/remote_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/api/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launcher/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/launcher/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launcher/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/launcher/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/launch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/sharded_grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/sharded_grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/fully_sharded_data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/fully_sharded_data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_wrap_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_wrap_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_unshard_param_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_unshard_param_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_traversal_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_traversal_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_trace_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_trace_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_state_dict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_state_dict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_shard_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_shard_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_runtime_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_runtime_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_optim_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_optim_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_limiter_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_limiter_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_init_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_init_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_fsdp_extensions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_fsdp_extensions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_flat_param.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_flat_param.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_exec_order_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_exec_order_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_dynamo_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_dynamo_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_debug_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_debug_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_common_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_common_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/examples/memory_tracker_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/examples/memory_tracker_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/log_level.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/log_level.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/elastic_distributed_sampler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/elastic_distributed_sampler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/cycling_iterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/cycling_iterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/local_timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/local_timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/file_based_local_timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/file_based_local_timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_server.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_server.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_rendezvous_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_rendezvous_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/dynamic_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/dynamic_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/tail_log.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/tail_log.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/subprocess_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/subprocess_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/redirects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/redirects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/error_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/error_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/metrics/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/metrics/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/metrics/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/metrics/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/local_elastic_agent.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/local_elastic_agent.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/health_check_server.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/health_check_server.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/distributed_c10d.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/distributed_c10d.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/device_mesh.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/device_mesh.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/collective_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/collective_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/storage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/storage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/stateful.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/stateful.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict_saver.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict_saver.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict_loader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict_loader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/resharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/resharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/planner_helpers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/planner_helpers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/planner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/planner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/format_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/format_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/filesystem.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/filesystem.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/stateful_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/stateful_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/async_checkpointing_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/async_checkpointing_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/default_planner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/default_planner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_traverse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_traverse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_storage_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_storage_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_sharded_tensor_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_sharded_tensor_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_nested_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_nested_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_fsspec_filesystem.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_fsspec_filesystem.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_dedup_tensors.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_dedup_tensors.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_dedup_save_plans.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_dedup_save_plans.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_checkpointer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_checkpointer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/c10d_logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/c10d_logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/benchmarks/benchmark_ddp_rpc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/benchmarks/benchmark_ddp_rpc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/argparse_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/argparse_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/averagers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/averagers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/join.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/join.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/optimizer_overlap_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/optimizer_overlap_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/mixed_precision_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/mixed_precision_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/ddp_zero_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/ddp_zero_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_quantization/quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_quantization/quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_optimizer_overlap/optimizer_overlap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_optimizer_overlap/optimizer_overlap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_optimizer_overlap/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_optimizer_overlap/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_comm_hooks/default_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_comm_hooks/default_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_comm_hooks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_comm_hooks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tools/memory_tracker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tools/memory_tracker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tools/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tools/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/tp_conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/tp_conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/sharding_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/sharding_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/redistribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/redistribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/placement_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/placement_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/view_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/view_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/random_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/random_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/pointwise_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/pointwise_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/matrix_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/matrix_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/math_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/math_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/experimental_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/experimental_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/conv_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/conv_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/common_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/common_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/basic_strategy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/basic_strategy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/op_schema.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/op_schema.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/tp_transform.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/tp_transform.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/visualize_sharding_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/visualize_sharding_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/torchrec_sharding_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/torchrec_sharding_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/convnext_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/convnext_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/checkpoint_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/checkpoint_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/device_mesh.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/device_mesh.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/visualize_sharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/visualize_sharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/op_coverage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/op_coverage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/comm_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/comm_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/_collective_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/_collective_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_state_dict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_state_dict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/partial_lower.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/partial_lower.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/parallel_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/parallel_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/log_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/log_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/iter_graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/iter_graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/graph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/graph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/graph_optimization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/graph_optimization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/gm_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/gm_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/experimental_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/experimental_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/distribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/distribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/comm_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/comm_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/batch_dim_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/batch_dim_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_sharding_spec/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_sharding_spec/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding_bag.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding_bag.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_plan/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_plan/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_plan/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_plan/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/reshard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/reshard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/misc_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/misc_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/binary_cmp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/binary_cmp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_optim/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_optim/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/op_registry_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/op_registry_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/common_op_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/common_op_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_functional_collectives_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_functional_collectives_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_functional_collectives.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_functional_collectives.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/replicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/replicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fully_shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fully_shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/fully_shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/fully_shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_param_group.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_param_group.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_param.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_param.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_collectives.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_collectives.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/contract.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/contract.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/checkpoint_activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/checkpoint_activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/streams.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/streams.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/nvtx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/nvtx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/nccl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/nccl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/memory.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/memory.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/jiterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/jiterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/graphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/graphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/error.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/error.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/comm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/comm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_sanitizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_sanitizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_memory_viz.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_memory_viz.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_gpu_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_gpu_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/lazy/test_mnist.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/csrc/lazy/test_mnist.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/jit/tensorexpr/scripts/bisect.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/csrc/jit/tensorexpr/scripts/bisect.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/jit/tensorexpr/codegen_external.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/csrc/jit/tensorexpr/codegen_external.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/contrib/_tensorboard_vis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/contrib/_tensorboard_vis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/contrib/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/contrib/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/compiler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/compiler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xnnpack/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/xnnpack/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xeon/run_cpu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/xeon/run_cpu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xeon/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/xeon/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/opt_einsum/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/opt_einsum/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/openmp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/openmp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/nnpack/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/nnpack/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mps/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mps/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mkldnn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mkldnn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mkl/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mkl/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mha/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mha/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cudnn/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cudnn/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cudnn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cudnn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/serializer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/serializer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_coreml/preprocess.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_coreml/preprocess.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_coreml/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_coreml/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler_legacy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler_legacy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/gradcheck.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/gradcheck.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/grad_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/grad_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/forward_ad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/forward_ad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/anomaly_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/anomaly_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/xnnpack_quantizer_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/xnnpack_quantizer_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/xnnpack_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/xnnpack_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/x86_inductor_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/x86_inductor_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/embedding_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/embedding_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/composable_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/composable_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_pt2e.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_pt2e.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantization_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantization_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quant_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quant_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/qconfig_mapping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/qconfig_mapping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/representation/rewrite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/representation/rewrite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/representation/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/representation/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/qat_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/qat_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/port_metadata_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/port_metadata_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/graph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/graph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/generate_numeric_debug_handle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/generate_numeric_debug_handle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/export_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/export_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/duplicate_dq_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/duplicate_dq_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/quantize_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/quantize_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/qconfig_mapping_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/qconfig_mapping_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lstm_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lstm_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lower_to_qnnpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lower_to_qnnpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lower_to_fbgemm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lower_to_fbgemm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/fuse_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/fuse_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/fuse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/fuse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/custom_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/custom_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report_visualizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report_visualizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report_observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report_observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/detector.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/detector.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_lower_to_native_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_lower_to_native_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_decomposed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_decomposed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fuser_method_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fuser_method_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fuse_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fuse_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/fake_quantize_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/fake_quantize_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/apot_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/apot_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/APoT_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/APoT_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/x86.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/x86.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/tensorrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/tensorrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/qnnpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/qnnpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/onednn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/onednn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/observation_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/observation_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/native.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/native.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/fbgemm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/fbgemm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/executorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/executorch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/backend_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/backend_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/_qnnpack_pt2e.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/_qnnpack_pt2e.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/_common_operator_config_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/_common_operator_config_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_learnable_fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_learnable_fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_correct_bias.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_correct_bias.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/weight_norm_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/weight_norm_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/base_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/base_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/lambda_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/lambda_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/cubic_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/cubic_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/base_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/base_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/saliency_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/saliency_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/prune_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/prune_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/parametrization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/parametrization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/lstm_saliency_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/lstm_saliency_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/base_structured_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/base_structured_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/FPGM_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/FPGM_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/quantization_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/quantization_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/tests/test_callbacks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/tests/test_callbacks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/data_sparsity.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/data_sparsity.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/_data_sparstity_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/_data_sparstity_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/data_norm_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/data_norm_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_model_metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_model_metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_forward_time.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_forward_time.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_disk_savings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_disk_savings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/dlrm_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/dlrm_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_scheduler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_scheduler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/activation_sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/activation_sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/weight_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/weight_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/qconfig_multi_mapping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/qconfig_multi_mapping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/ns_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/ns_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/n_shadows_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/n_shadows_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/graph_passes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/graph_passes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/graph_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/graph_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/_numeric_suite_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/_numeric_suite_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/_numeric_suite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/_numeric_suite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/dynamic/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/dynamic/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/functional_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/functional_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/conv_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/conv_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/conv_add.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/conv_add.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/bn_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/bn_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/linear_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/linear_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/conv_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/conv_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/modules/fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/modules/fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_weights_only_unpickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_weights_only_unpickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vmap_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_vmap_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/_structures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/_structures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_utils_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_utils_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_torch_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_torch_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor_str.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_tensor_str.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_tensor_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/schema_check_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/schema_check_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/meta_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/meta_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/functional_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/functional_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_streambase.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_streambase.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_storage_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_storage_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_sources.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_sources.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/special/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/special/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/nn/functional/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/nn/functional/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/linalg/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/linalg/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_python_dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_python_dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims_common/wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims_common/wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims_common/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims_common/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/rng_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/rng_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/executor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/executor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/debug_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/debug_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/testing/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/testing/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/linalg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/linalg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_unary_ufuncs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_unary_ufuncs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_ufuncs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_ufuncs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_reductions_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_reductions_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_normalizations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_normalizations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_ndarray.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_ndarray.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_getlimits.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_getlimits.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_funcs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_funcs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_funcs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_funcs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_dtypes_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_dtypes_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_dtypes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_dtypes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_casting_dicts.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_casting_dicts.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_binary_ufuncs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_binary_ufuncs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_namedtensor_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_namedtensor_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_meta_registrations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_meta_registrations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lowrank.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lowrank.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/structured.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/_registrations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/_registrations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lobpcg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lobpcg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_linalg_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_linalg_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/simple_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/simple_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/fake_class_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/fake_class_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/custom_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/abstract_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/abstract_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/ts_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/ts_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/tensor_factory_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/tensor_factory_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/ir_cache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/ir_cache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/extract_compiled_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/extract_compiled_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/device_context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/device_context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/computation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/computation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/closure.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/closure.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_jit_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_jit_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/wrapper_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/wrapper_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/virtualized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/virtualized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/triton_heuristics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/triton_heuristics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/triton_helpers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/triton_helpers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/test_operators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/test_operators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/test_case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/test_case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/sizevars.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/sizevars.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/select_algorithm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/select_algorithm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/quantized_lowerings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/quantized_lowerings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/pattern_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/pattern_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/optimize_indexing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/optimize_indexing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/ops_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/ops_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/mkldnn_lowerings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/mkldnn_lowerings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/lowering.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/lowering.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/unpack_mixed_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/unpack_mixed_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm_plus_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm_plus_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/bmm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/bmm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/ir.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/ir.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/inductor_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/inductor_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/index_propagation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/index_propagation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/split_cat.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/split_cat.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/mm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/mm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/bmm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/bmm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/addmm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/addmm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_9.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_9.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_8.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_8.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_7.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_7.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_6.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_6.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_5.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_5.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_4.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_4.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_3.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_3.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_18.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_18.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_17.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_17.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_16.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_16.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_15.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_15.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_14.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_14.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_13.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_13.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_12.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_12.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_11.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_11.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_10.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_10.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_1.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_1.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/replace_random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/replace_random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/reinplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/reinplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/pre_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/pre_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/post_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/post_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/pad_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/pad_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/numeric_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/numeric_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/mkldnn_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/mkldnn_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/misc_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/misc_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/joint_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/joint_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/group_batch_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/group_batch_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/fuse_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/fuse_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/freezing_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/freezing_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/efficient_conv_bn_eval.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/efficient_conv_bn_eval.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/dedupe_symint_uses.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/dedupe_symint_uses.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/decompose_mem_bound_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/decompose_mem_bound_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/ddp_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/ddp_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/binary_folding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/binary_folding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/freezing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/freezing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/exc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/exc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/decomposition.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/decomposition.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/cudagraph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/cudagraph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/cudagraph_trees.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/cudagraph_trees.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/coordinate_descent_tuner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/coordinate_descent_tuner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/constant_folding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/constant_folding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/compile_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/compile_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/comms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/comms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/comm_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/comm_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/xpu/device_op_overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/xpu/device_op_overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/xpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/xpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_split_scan.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_split_scan.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_foreach.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_foreach.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/multi_kernel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/multi_kernel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/memory_planning.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/memory_planning.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda_combined_scheduling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda_combined_scheduling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/gemm_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/gemm_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/device_op_overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/device_op_overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_lib_extensions/gemm_operation_extensions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_lib_extensions/gemm_operation_extensions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_lib_extensions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_lib_extensions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_epilogue_gen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_epilogue_gen.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_kernel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_kernel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_env.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_env.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_cpp_scheduling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_cpp_scheduling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp_wrapper_cuda.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp_wrapper_cuda.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp_wrapper_cpu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp_wrapper_cpu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codecache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codecache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/bounds.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/bounds.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/autotune_process.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/autotune_process.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/while_loop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/while_loop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/triton_kernel_wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/triton_kernel_wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/torchbind.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/torchbind.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/strict_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/strict_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/out_dtype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/out_dtype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/map.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/effects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/effects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/cond.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/cond.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/auto_functionalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/auto_functionalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_guards.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_guards.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/vmap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/vmap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/top_operators_github_usage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/top_operators_github_usage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/pytree_hacks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/pytree_hacks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/python_key.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/python_key.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/pyfunctorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/pyfunctorch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/partitioners.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/partitioners.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/make_functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/make_functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/fx_minifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/fx_minifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/functional_call.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/functional_call.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/eager_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/eager_transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/deprecated.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/deprecated.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/compilers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/compilers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/compile_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/compile_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/benchmark_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/benchmark_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/batch_norm_replacement.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/batch_norm_replacement.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/autograd_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/autograd_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/apis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/apis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/aot_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/aot_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/traced_function_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/traced_function_transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/subclass_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/subclass_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/schemas.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/schemas.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/runtime_wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/runtime_wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/logging_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/logging_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/input_output_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/input_output_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/functional_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/functional_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/collect_metadata_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/collect_metadata_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/verifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/verifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/upgrade.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/upgrade.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/union.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/union.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/serialize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/serialize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/schema_check.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/schema_check.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/schema.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/schema.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_view_ops_with_view_copy_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_view_ops_with_view_copy_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_sym_size_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_sym_size_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_set_grad_with_hop_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_set_grad_with_hop_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/remove_runtime_assertions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/remove_runtime_assertions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/lift_constants_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/lift_constants_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/functionalize_side_effectful_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/functionalize_side_effectful_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/collect_tracepoints_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/collect_tracepoints_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/add_runtime_assertions_for_constraints_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/add_runtime_assertions_for_constraints_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/proxy_value.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/proxy_value.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/node_metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/node_metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/non_strict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/non_strict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/exported_program.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/exported_program.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/error.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/error.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/gen_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/gen_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/user_input_mutation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/user_input_mutation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/type_reflection_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/type_reflection_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/torch_sym_min.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/torch_sym_min.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/tensor_setattr.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/tensor_setattr.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/static_if.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/static_if.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/static_for_loop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/static_for_loop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/specialized_attribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/specialized_attribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/scalar_output.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/scalar_output.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/pytree_flatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/pytree_flatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/optional_input.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/optional_input.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/null_context_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/null_context_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/nested_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/nested_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/model_attr_mutation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/model_attr_mutation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/list_unpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/list_unpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/list_contains.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/list_contains.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/fn_with_kwargs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/fn_with_kwargs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_view.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_view.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_slicing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_slicing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_round.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_round.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_map.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_if_guard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_if_guard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_constructor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_constructor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_assert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_assert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dictionary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dictionary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/constrain_as_value_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/constrain_as_value_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/constrain_as_size_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/constrain_as_size_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_predicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_predicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_operands.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_operands.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_closed_over_variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_closed_over_variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_nonlocal_variables.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_nonlocal_variables.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_nested_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_nested_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_class_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_class_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/class_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/class_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/autograd_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/autograd_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/assume_constant_result.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/assume_constant_result.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/user_defined.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/user_defined.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/torch_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/torch_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/torch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/torch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/sdpa.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/sdpa.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/nn_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/nn_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/misc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/misc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/lists.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/lists.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/lazy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/iter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/iter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/higher_order_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/higher_order_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/dicts.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/dicts.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/ctx_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/ctx_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/constant.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/constant.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/builtin.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/builtin.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/builder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/builder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/trace_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/trace_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/testing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/testing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/test_minifier_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/test_minifier_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/test_case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/test_case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/tensor_version_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/tensor_version_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/symbolic_convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/symbolic_convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/source.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/source.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/side_effects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/side_effects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/resume_execution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/resume_execution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/after_dynamo.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/after_dynamo.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/after_aot.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/after_aot.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/replay_record.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/replay_record.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/polyfill.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/polyfill.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/output_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/output_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/mutation_guard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/mutation_guard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/guards.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/guards.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/funcname_cache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/funcname_cache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/external_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/external_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/exc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/exc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/eval_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/eval_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/device_interface.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/device_interface.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/decorators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/decorators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/debug_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/debug_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/current_scope_id.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/current_scope_id.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/create_parameter_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/create_parameter_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/convert_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/convert_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/comptime.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/comptime.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/compiled_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/compiled_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/codegen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/codegen.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/code_context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/code_context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/callback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/callback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/cache_size.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/cache_size.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/bytecode_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/bytecode_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/bytecode_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/bytecode_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/tvm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/tvm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/torchxla.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/torchxla.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/tensorrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/tensorrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/onnxrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/onnxrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/inductor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/inductor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/debugging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/debugging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/cudagraphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/cudagraphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/_trace_wrapped_higher_order_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/_trace_wrapped_higher_order_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dispatch/python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dispatch/python.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dispatch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_dispatch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_deploy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_deploy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions_for_rng.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions_for_rng.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions_for_jvp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions_for_jvp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_compile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_compile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_classes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_classes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_awaits/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_awaits/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_appdirs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_appdirs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__future__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/__future__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__config__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/__config__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_VF.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torch/_VF.py ++ find ./torchgen/ -name '*.py' + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/yaml_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/yaml_utils.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/utils.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/generator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/generator.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/gen_static_runtime_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/gen_static_runtime_ops.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/config.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/shape_functions/gen_jit_shape_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/shape_functions/gen_jit_shape_functions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/selector.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/selector.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/operator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/operator.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/gen_mobile_upgraders_constant.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/gen_mobile_upgraders_constant.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/gen_mobile_upgraders.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/gen_mobile_upgraders.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/native_function_generation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/native_function_generation.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/model.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/model.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/local.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/local.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_vmap_plumbing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_vmap_plumbing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_lazy_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_lazy_tensor.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_functionalization_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_functionalization_type.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_executorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_executorch.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_backend_stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_backend_stubs.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_aoti_c_shim.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_aoti_c_shim.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/fuse/gen_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/fuse/gen_patterns.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/parse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/parse.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/model.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/model.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/unboxing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/unboxing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/types.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/signatures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/signatures.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/et_cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/et_cpp.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/custom_ops.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/ufunc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/ufunc.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/register_dispatch_key.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/register_dispatch_key.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/native_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/native_functions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/lazy_ts_lowering.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/lazy_ts_lowering.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/lazy_ir.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/lazy_ir.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/decompositions/gen_jit_decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/decompositions/gen_jit_decompositions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/context.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/code_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/code_template.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/unboxing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/unboxing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/ufunc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/ufunc.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/types_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/types_base.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/types.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/signatures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/signatures.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/translate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/translate.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/structured.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/python.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/native.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/native.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/meta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/meta.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/lazy.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/functionalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/functionalization.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/dispatcher.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/cpp.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/autograd.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./torchgen/__init__.py ++ find ./functorch/ -name '*.py' + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/op_analysis/gen_data.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/op_analysis/gen_data.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_per_sample_gradients.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_per_sample_gradients.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_jacobians_and_hessians.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_jacobians_and_hessians.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_ensembling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_ensembling.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/experimental/ops.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/control_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/experimental/control_flow.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/experimental/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang_transforms_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang_transforms_module.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang_transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/support/omniglot_loaders.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/support/omniglot_loaders.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-ptonly.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-ptonly.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-higher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-higher.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/lennard_jones/lennard_jones.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/lennard_jones/lennard_jones.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/ensembling/parallel_train.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/ensembling/parallel_train.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/dp_cifar10/cifar10_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/dp_cifar10/cifar10_transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/dp_cifar10/cifar10_opacus.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/dp_cifar10/cifar10_opacus.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/simple_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/simple_function.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/linear_train.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/linear_train.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/fuse_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/fuse_module.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/eager_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/eager_fusion.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/rearrange.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/einops/rearrange.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/_parsing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/einops/_parsing.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/einops/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/docs/source/conf.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/docs/source/conf.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/wrap_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/wrap_type.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/tree_map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/tree_map.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/reference.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/op_properties.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/op_properties.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/magic_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/magic_trace.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/dim.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/dim.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/delayed_mul_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/delayed_mul_tensor.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/batch_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/batch_tensor.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/compile/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/compile/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/process_scorecard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/process_scorecard.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/pointwise_scorecard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/pointwise_scorecard.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/per_sample_grads.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/per_sample_grads.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/operator_authoring.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/operator_authoring.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/cse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/cse.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/chrome_trace_parser.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/chrome_trace_parser.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/vmap/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/vmap/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/make_functional/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/make_functional/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/eager_transforms/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/eager_transforms/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/aot_autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/aot_autograd/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/python3.12/site-packages/./functorch/__init__.py ++ /usr/local/cuda/bin/nvcc --version ++ grep release ++ awk '{print $2}' ++ cut -d, -f2 + cuver=12.3 + echo 'from typing import Optional' + echo '__all__ = ['\''__version__'\'', '\''debug'\'', '\''cuda'\'', '\''git_version'\'', '\''hip'\'']' + echo '__version__ = '\''2.4.0'\''' + echo 'debug = False' + echo 'cuda: Optional[str] = '\''12.3'\''' + echo 'git_version = '\''7efaf54dc46034189cb36b345764a5a9a5b693d4'\''' + echo 'hip: Optional[str] = None' + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//builddir/build/BUILD/pytorch/nvfuser/nvfuser.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/ mv: cannot stat '/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//builddir/build/BUILD/pytorch/nvfuser/nvfuser.so': No such file or directory + true + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//builddir/build/BUILD/pytorch/torch/lib/libnvfuser_codegen.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/ mv: cannot stat '/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//builddir/build/BUILD/pytorch/torch/lib/libnvfuser_codegen.so': No such file or directory + true + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/include/fmt + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/include/clog.h + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/include/xnnpack.h + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//builddir/build/BUILD/pytorch/test + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//builddir/build/BUILD/pytorch/nvfuser + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/cmake/fmt + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64//usr/lib64/pkgconfig/fmt.pc + find /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 -name functorch.so -exec rm -f '{}' ';' + /usr/bin/python3 setup.py egg_info Building wheel torch-2.4.0a0+git7efaf54 running egg_info creating torch.egg-info writing torch.egg-info/PKG-INFO writing dependency_links to torch.egg-info/dependency_links.txt writing entry points to torch.egg-info/entry_points.txt writing requirements to torch.egg-info/requires.txt writing top-level names to torch.egg-info/top_level.txt writing manifest file 'torch.egg-info/SOURCES.txt' reading manifest file 'torch.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files matching '*.o' found anywhere in distribution warning: no previously-included files matching '*.so' found anywhere in distribution warning: no previously-included files matching '*.dylib' found anywhere in distribution warning: no previously-included files matching '*.a' found anywhere in distribution warning: no previously-included files matching '*.swp' found anywhere in distribution adding license file 'LICENSE' adding license file 'NOTICE' writing manifest file 'torch.egg-info/SOURCES.txt' + cp -r torch.egg-info /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/python3.12/site-packages/ + sed -i '/^\[/!s/[<=>].*//g' /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/python3.12/site-packages/torch.egg-info/requires.txt + sed -i /triton/d /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/python3.12/site-packages/torch.egg-info/requires.txt + set +x Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/bin/torch_shm_manager Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libc10.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libc10_cuda.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libcaffe2_nvrtc.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libnnapi_backend.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libshm.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libtorch.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libtorch_cpu.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libtorch_cuda.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libtorch_cuda_linalg.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libtorch_global_deps.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/libtorch_python.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/python3.12/site-packages/functorch/_C.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/python3.12/site-packages/torch/_C.so + /usr/lib/rpm/check-buildroot + /usr/lib/rpm/redhat/brp-ldconfig + /usr/lib/rpm/brp-compress + /usr/lib/rpm/brp-strip /usr/bin/strip + /usr/lib/rpm/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump + /usr/lib/rpm/redhat/brp-strip-lto /usr/bin/strip + /usr/lib/rpm/brp-strip-static-archive /usr/bin/strip + /usr/lib/rpm/check-rpaths + /usr/lib/rpm/redhat/brp-mangle-shebangs + /usr/lib/rpm/brp-remove-la-files + env /usr/lib/rpm/redhat/brp-python-bytecompile '' 1 0 -j4 Bytecompiling .py files below /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/lib64/python3.12 using python3.12 + /usr/lib/rpm/redhat/brp-python-hardlink Processing files: pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.thEJYG + umask 022 + cd /builddir/build/BUILD + cd pytorch + DOCDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/share/doc/pytorch + export LC_ALL= + LC_ALL= + export DOCDIR + /usr/bin/mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/share/doc/pytorch + cp -pr /builddir/build/BUILD/pytorch/README.md /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/share/doc/pytorch + cp -pr /builddir/build/BUILD/pytorch/CONTRIBUTING.md /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/share/doc/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Executing(%license): /bin/sh -e /var/tmp/rpm-tmp.coVi89 + umask 022 + cd /builddir/build/BUILD + cd pytorch + LICENSEDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/share/licenses/pytorch + export LC_ALL= + LC_ALL= + export LICENSEDIR + /usr/bin/mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/share/licenses/pytorch + cp -pr /builddir/build/BUILD/pytorch/LICENSE /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64/usr/share/licenses/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Provides: libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcaffe2_nvrtc.so()(64bit) libnnapi_backend.so()(64bit) libshm.so.2.4()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_cuda_linalg.so()(64bit) libtorch_global_deps.so.2.4()(64bit) pytorch = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc39 pytorch(aarch-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc39 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.17)(64bit) libc.so.6(GLIBC_2.28)(64bit) libc.so.6(GLIBC_2.32)(64bit) libc.so.6(GLIBC_2.33)(64bit) libc.so.6(GLIBC_2.34)(64bit) libc.so.6(GLIBC_2.38)(64bit) libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcpuinfo.so.1()(64bit) libcublas.so.12()(64bit) libcublas.so.12(libcublas.so.12)(64bit) libcublasLt.so.12()(64bit) libcublasLt.so.12(libcublasLt.so.12)(64bit) libcuda.so.1()(64bit) libcudart.so.12()(64bit) libcudart.so.12(libcudart.so.12)(64bit) libcudnn.so.8()(64bit) libcudnn.so.8(libcudnn.so.8)(64bit) libcufft.so.11()(64bit) libcufft.so.11(libcufft.so.11)(64bit) libcurand.so.10()(64bit) libcusolver.so.11()(64bit) libcusolver.so.11(libcusolver.so.11)(64bit) libcusparse.so.12()(64bit) libcusparse.so.12(libcusparse.so.12)(64bit) libfoxi_loader.so.1()(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libgcc_s.so.1(GCC_4.2.0)(64bit) libgcc_s.so.1(GCC_4.5.0)(64bit) libgflags.so.2.2()(64bit) libglog.so.0()(64bit) libgloo.so.1()(64bit) libgloo_cuda.so.1()(64bit) libgomp.so.1()(64bit) libgomp.so.1(GOMP_4.0)(64bit) libgomp.so.1(OMP_1.0)(64bit) libhiredis.so.1.0.0()(64bit) libkineto.so.1()(64bit) libleveldb.so.1()(64bit) liblmdb.so.0.0.0()(64bit) libm.so.6()(64bit) libm.so.6(GLIBC_2.17)(64bit) libm.so.6(GLIBC_2.23)(64bit) libm.so.6(GLIBC_2.27)(64bit) libm.so.6(GLIBC_2.29)(64bit) libm.so.6(GLIBC_2.35)(64bit) libm.so.6(GLIBC_2.38)(64bit) libmagma.so.1()(64bit) libnccl.so.2()(64bit) libnnpack.so.1()(64bit) libnuma.so.1()(64bit) libnuma.so.1(libnuma_1.1)(64bit) libnuma.so.1(libnuma_1.2)(64bit) libnvToolsExt.so.1()(64bit) libnvToolsExt.so.1(libnvToolsExt.so.1)(64bit) libnvrtc.so.12()(64bit) libnvrtc.so.12(libnvrtc.so.12)(64bit) libonnx.so()(64bit) libonnx_optimizer.so()(64bit) libonnx_proto.so()(64bit) libopenblaso.so.0()(64bit) libopencv_calib3d.so.409()(64bit) libopencv_core.so.409()(64bit) libopencv_cudev.so.409()(64bit) libopencv_dnn.so.409()(64bit) libopencv_features2d.so.409()(64bit) libopencv_flann.so.409()(64bit) libopencv_highgui.so.409()(64bit) libopencv_imgcodecs.so.409()(64bit) libopencv_imgproc.so.409()(64bit) libopencv_optflow.so.409()(64bit) libopencv_video.so.409()(64bit) libopencv_videoio.so.409()(64bit) libopencv_ximgproc.so.409()(64bit) libprotobuf.so.32()(64bit) libpthreadpool.so.1()(64bit) libqnnpack.so.1()(64bit) libshm.so.2.4()(64bit) libsleef.so.3()(64bit) libsnappy.so.1()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.11)(64bit) libstdc++.so.6(CXXABI_1.3.13)(64bit) libstdc++.so.6(CXXABI_1.3.2)(64bit) libstdc++.so.6(CXXABI_1.3.3)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(CXXABI_1.3.7)(64bit) libstdc++.so.6(CXXABI_1.3.8)(64bit) libstdc++.so.6(CXXABI_1.3.9)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.17)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.19)(64bit) libstdc++.so.6(GLIBCXX_3.4.20)(64bit) libstdc++.so.6(GLIBCXX_3.4.21)(64bit) libstdc++.so.6(GLIBCXX_3.4.22)(64bit) libstdc++.so.6(GLIBCXX_3.4.26)(64bit) libstdc++.so.6(GLIBCXX_3.4.29)(64bit) libstdc++.so.6(GLIBCXX_3.4.30)(64bit) libstdc++.so.6(GLIBCXX_3.4.32)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtensorpipe.so.1()(64bit) libtensorpipe_cuda.so.1()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_python.so.2.4()(64bit) libzmq.so.5()(64bit) rtld(GNU_HASH) Processing files: pytorch-devel-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 Provides: cmake(ATen) cmake(Caffe2) cmake(Torch) = 2.4.0 cmake(aten) cmake(caffe2) cmake(torch) = 2.4.0 pytorch-devel = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc39 pytorch-devel(aarch-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc39 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: cmake-filesystem libc10.so.2.4()(64bit) libshm.so.2.4()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_global_deps.so.2.4()(64bit) Processing files: pytorch-python3-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager -> /usr/bin/torch_shm_manager warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/include -> /usr/include warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/lib -> /usr/lib64 Provides: libtorch_python.so.2.4()(64bit) python3.12dist(torch) = 2.4.0 python3.12dist(torch) = 2.4~a0 python3dist(torch) = 2.4~a0 pytorch-python3 = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc39 pytorch-python3(aarch-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc39 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.17)(64bit) libc.so.6(GLIBC_2.32)(64bit) libc.so.6(GLIBC_2.34)(64bit) libc.so.6(GLIBC_2.38)(64bit) libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcudart.so.12()(64bit) libcudart.so.12(libcudart.so.12)(64bit) libcudnn.so.8()(64bit) libcudnn.so.8(libcudnn.so.8)(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libgcc_s.so.1(GCC_4.5.0)(64bit) libglog.so.0()(64bit) libnvToolsExt.so.1()(64bit) libnvToolsExt.so.1(libnvToolsExt.so.1)(64bit) libprotobuf.so.32()(64bit) libshm.so.2.4()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.11)(64bit) libstdc++.so.6(CXXABI_1.3.13)(64bit) libstdc++.so.6(CXXABI_1.3.2)(64bit) libstdc++.so.6(CXXABI_1.3.3)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(CXXABI_1.3.8)(64bit) libstdc++.so.6(CXXABI_1.3.9)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.19)(64bit) libstdc++.so.6(GLIBCXX_3.4.20)(64bit) libstdc++.so.6(GLIBCXX_3.4.21)(64bit) libstdc++.so.6(GLIBCXX_3.4.22)(64bit) libstdc++.so.6(GLIBCXX_3.4.26)(64bit) libstdc++.so.6(GLIBCXX_3.4.29)(64bit) libstdc++.so.6(GLIBCXX_3.4.30)(64bit) libstdc++.so.6(GLIBCXX_3.4.32)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_python.so.2.4()(64bit) python(abi) = 3.12 python3.12dist(filelock) python3.12dist(fsspec) python3.12dist(jinja2) python3.12dist(networkx) python3.12dist(sympy) python3.12dist(typing-extensions) rtld(GNU_HASH) Checking for unpackaged file(s): /usr/lib/rpm/check-files /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 Wrote: /builddir/build/RPMS/pytorch-devel-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64.rpm Wrote: /builddir/build/RPMS/pytorch-python3-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64.rpm Wrote: /builddir/build/RPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.zImyjr + umask 022 + cd /builddir/build/BUILD + cd pytorch + /usr/bin/rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.aarch64 + RPM_EC=0 ++ jobs -p + exit 0 Executing(rmbuild): /bin/sh -e /var/tmp/rpm-tmp.QJzsM5 + umask 022 + cd /builddir/build/BUILD + rm -rf /builddir/build/BUILD/pytorch-SPECPARTS + rm -rf pytorch pytorch.gemspec + RPM_EC=0 ++ jobs -p + exit 0 RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) absolute symlink: /usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager -> /usr/bin/torch_shm_manager absolute symlink: /usr/lib64/python3.12/site-packages/torch/include -> /usr/include absolute symlink: /usr/lib64/python3.12/site-packages/torch/lib -> /usr/lib64 Finish: rpmbuild pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm Finish: build phase for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm INFO: chroot_scan: 3 files copied to /var/lib/copr-rpmbuild/results/chroot_scan INFO: /var/lib/mock/fedora-39-aarch64-1712885724.178146/root/var/log/dnf.rpm.log /var/lib/mock/fedora-39-aarch64-1712885724.178146/root/var/log/dnf.librepo.log /var/lib/mock/fedora-39-aarch64-1712885724.178146/root/var/log/dnf.log INFO: Done(/var/lib/copr-rpmbuild/results/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc39.src.rpm) Config(child) 416 minutes 49 seconds INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results INFO: Cleaning up build root ('cleanup_on_success=True') Start: clean chroot INFO: unmounting tmpfs. Finish: clean chroot Finish: run Running RPMResults tool Package info: { "packages": [ { "name": "pytorch", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc39", "arch": "aarch64" }, { "name": "pytorch", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc39", "arch": "src" }, { "name": "pytorch-devel", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc39", "arch": "aarch64" }, { "name": "pytorch-python3", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc39", "arch": "aarch64" } ] } RPMResults finished