Warning: Permanently added '3.91.244.87' (ED25519) to the list of known hosts. You can reproduce this build on your computer by running: sudo dnf install copr-rpmbuild /usr/bin/copr-rpmbuild --verbose --drop-resultdir --task-url https://copr.fedorainfracloud.org/backend/get-build-task/7299609-fedora-40-aarch64 --chroot fedora-40-aarch64 Version: 0.72 PID: 17584 Logging PID: 17585 Task: {'allow_user_ssh': False, 'appstream': False, 'background': False, 'build_id': 7299609, 'buildroot_pkgs': [], 'chroot': 'fedora-40-aarch64', 'enable_net': True, 'fedora_review': False, 'git_hash': 'fc3ea6d12e110fb301228cf1a067d84f30eacfd5', 'git_repo': 'https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch', 'isolation': 'default', 'memory_reqs': 2048, 'package_name': 'pytorch', 'package_version': '2.4.0-20240412.0.git7efaf54d.cu12_3', 'project_dirname': 'ML', 'project_name': 'ML', 'project_owner': 'rezso', 'repo_priority': None, 'repos': [{'baseurl': 'https://download.copr.fedorainfracloud.org/results/rezso/ML/fedora-40-aarch64/', 'id': 'copr_base', 'name': 'Copr repository', 'priority': None}, {'baseurl': 'https://download.copr.fedorainfracloud.org/results/rezso/CUDA/fedora-40-aarch64/', 'id': 'copr_rezso_CUDA', 'name': 'Additional repo copr_rezso_CUDA'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/ppc64le', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_ppc64le', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_ppc64le'}], 'sandbox': 'rezso/ML--rezso', 'source_json': {}, 'source_type': None, 'ssh_public_keys': None, 'submitter': 'rezso', 'tags': [], 'task_id': '7299609-fedora-40-aarch64', 'timeout': 172800, 'uses_devel_repo': False, 'with_opts': [], 'without_opts': []} Running: git clone https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch /var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch --depth 500 --no-single-branch --recursive cmd: ['git', 'clone', 'https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch', '/var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch', '--depth', '500', '--no-single-branch', '--recursive'] cwd: . rc: 0 stdout: stderr: Cloning into '/var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch'... Running: git checkout fc3ea6d12e110fb301228cf1a067d84f30eacfd5 -- cmd: ['git', 'checkout', 'fc3ea6d12e110fb301228cf1a067d84f30eacfd5', '--'] cwd: /var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch rc: 0 stdout: stderr: Note: switching to 'fc3ea6d12e110fb301228cf1a067d84f30eacfd5'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at fc3ea6d automatic import of pytorch Running: copr-distgit-client sources cmd: ['copr-distgit-client', 'sources'] cwd: /var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch rc: 0 stdout: stderr: INFO: Reading stdout from command: git rev-parse --abbrev-ref HEAD INFO: Reading stdout from command: git rev-parse HEAD INFO: Reading sources specification file: sources /usr/bin/tail: /var/lib/copr-rpmbuild/main.log: file truncated Running (timeout=172800): unbuffer mock --spec /var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch/pytorch.spec --sources /var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch --resultdir /var/lib/copr-rpmbuild/results --uniqueext 1712885791.289313 -r /var/lib/copr-rpmbuild/results/configs/child.cfg INFO: mock.py version 5.5 starting (python version = 3.12.1, NVR = mock-5.5-1.fc39), args: /usr/libexec/mock/mock --spec /var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch/pytorch.spec --sources /var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch --resultdir /var/lib/copr-rpmbuild/results --uniqueext 1712885791.289313 -r /var/lib/copr-rpmbuild/results/configs/child.cfg Start(bootstrap): init plugins INFO: tmpfs initialized INFO: selinux enabled INFO: chroot_scan: initialized INFO: compress_logs: initialized Finish(bootstrap): init plugins Start: init plugins INFO: tmpfs initialized INFO: selinux enabled INFO: chroot_scan: initialized INFO: compress_logs: initialized Finish: init plugins INFO: Signal handler active Start: run INFO: Start(/var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch/pytorch.spec) Config(fedora-40-aarch64) Start: clean chroot Finish: clean chroot Mock Version: 5.5 INFO: Mock Version: 5.5 Start(bootstrap): chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-40-aarch64-bootstrap-1712885791.289313/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start(bootstrap): cleaning package manager metadata Finish(bootstrap): cleaning package manager metadata INFO: Guessed host environment type: unknown INFO: Using bootstrap image: registry.fedoraproject.org/fedora:40 INFO: Pulling image: registry.fedoraproject.org/fedora:40 INFO: Copy content of container registry.fedoraproject.org/fedora:40 to /var/lib/mock/fedora-40-aarch64-bootstrap-1712885791.289313/root INFO: Checking that registry.fedoraproject.org/fedora:40 image matches host's architecture INFO: mounting registry.fedoraproject.org/fedora:40 with podman image mount INFO: image registry.fedoraproject.org/fedora:40 as /var/lib/containers/storage/overlay/bec4a40577d52e0fcb5d2bd112793ce75e7585454a079277c9524526296e8734/merged INFO: umounting image registry.fedoraproject.org/fedora:40 (/var/lib/containers/storage/overlay/bec4a40577d52e0fcb5d2bd112793ce75e7585454a079277c9524526296e8734/merged) with podman image umount INFO: Using 'dnf' instead of 'dnf5' for bootstrap chroot INFO: Package manager dnf detected and used (fallback) INFO: Bootstrap image not marked ready Start(bootstrap): installing dnf5 tooling No matches found for the following disable plugin patterns: local, spacewalk, versionlock Copr repository 3.7 MB/s | 157 kB 00:00 Additional repo copr_rezso_CUDA 1.3 MB/s | 38 kB 00:00 Additional repo http_developer_download_nvidia_ 50 MB/s | 713 kB 00:00 Additional repo http_developer_download_nvidia_ 33 MB/s | 448 kB 00:00 Additional repo http_developer_download_nvidia_ 39 MB/s | 433 kB 00:00 fedora 33 MB/s | 19 MB 00:00 updates 1.1 kB/s | 134 B 00:00 Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Installing: dnf5 aarch64 5.1.15-1.fc40 fedora 567 k dnf5-plugins aarch64 5.1.15-1.fc40 fedora 333 k Installing dependencies: fmt aarch64 10.2.1-4.fc40 fedora 121 k libdnf5 aarch64 5.1.15-1.fc40 fedora 905 k libdnf5-cli aarch64 5.1.15-1.fc40 fedora 217 k sdbus-cpp aarch64 1.4.0-2.fc40 fedora 101 k systemd-libs aarch64 255.4-1.fc40 fedora 694 k Transaction Summary ================================================================================ Install 7 Packages Total download size: 2.9 M Installed size: 9.5 M Downloading Packages: (1/7): dnf5-5.1.15-1.fc40.aarch64.rpm 33 MB/s | 567 kB 00:00 (2/7): dnf5-plugins-5.1.15-1.fc40.aarch64.rpm 18 MB/s | 333 kB 00:00 (3/7): fmt-10.2.1-4.fc40.aarch64.rpm 6.0 MB/s | 121 kB 00:00 (4/7): libdnf5-cli-5.1.15-1.fc40.aarch64.rpm 28 MB/s | 217 kB 00:00 (5/7): sdbus-cpp-1.4.0-2.fc40.aarch64.rpm 7.4 MB/s | 101 kB 00:00 (6/7): libdnf5-5.1.15-1.fc40.aarch64.rpm 43 MB/s | 905 kB 00:00 (7/7): systemd-libs-255.4-1.fc40.aarch64.rpm 49 MB/s | 694 kB 00:00 -------------------------------------------------------------------------------- Total 8.5 MB/s | 2.9 MB 00:00 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : fmt-10.2.1-4.fc40.aarch64 1/7 Installing : libdnf5-5.1.15-1.fc40.aarch64 2/7 Installing : libdnf5-cli-5.1.15-1.fc40.aarch64 3/7 Installing : dnf5-5.1.15-1.fc40.aarch64 4/7 Installing : systemd-libs-255.4-1.fc40.aarch64 5/7 Installing : sdbus-cpp-1.4.0-2.fc40.aarch64 6/7 Installing : dnf5-plugins-5.1.15-1.fc40.aarch64 7/7 Running scriptlet: dnf5-plugins-5.1.15-1.fc40.aarch64 7/7 Installed: dnf5-5.1.15-1.fc40.aarch64 dnf5-plugins-5.1.15-1.fc40.aarch64 fmt-10.2.1-4.fc40.aarch64 libdnf5-5.1.15-1.fc40.aarch64 libdnf5-cli-5.1.15-1.fc40.aarch64 sdbus-cpp-1.4.0-2.fc40.aarch64 systemd-libs-255.4-1.fc40.aarch64 Complete! INFO: Switching package manager from dnf to the dnf5 (direct choice) Finish(bootstrap): installing dnf5 tooling Start(bootstrap): creating root cache Finish(bootstrap): creating root cache Finish(bootstrap): chroot init Start: chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-40-aarch64-1712885791.289313/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start: cleaning package manager metadata Finish: cleaning package manager metadata INFO: enabled HW Info plugin INFO: Package manager dnf5 detected and used (direct choice) INFO: Buildroot is handled by package management downloaded with a bootstrap image: rpm-4.19.1.1-1.fc40.aarch64 rpm-sequoia-1.6.0-2.fc40.aarch64 python3-dnf-4.19.0-1.fc40.noarch yum-4.19.0-1.fc40.noarch dnf5-5.1.15-1.fc40.aarch64 dnf5-plugins-5.1.15-1.fc40.aarch64 Start: installing minimal buildroot with dnf5 Updating and loading repositories: fedora 100% | 34.0 MiB/s | 19.9 MiB | 00m01s updates 100% | 236.6 KiB/s | 16.3 KiB | 00m00s Copr repository 100% | 5.7 MiB/s | 158.1 KiB | 00m00s Additional repo copr_rezso_CUDA 100% | 1.5 MiB/s | 39.4 KiB | 00m00s Additional repo http_developer_downloa 100% | 71.0 MiB/s | 727.0 KiB | 00m00s Additional repo http_developer_downloa 100% | 49.9 MiB/s | 460.2 KiB | 00m00s Additional repo http_developer_downloa 100% | 57.8 MiB/s | 532.9 KiB | 00m00s Repositories loaded. Package Arch Version Repository Size Installing group/module packages: bash aarch64 5.2.26-3.fc40 fedora 8.3 MiB bzip2 aarch64 1.0.8-18.fc40 fedora 427.5 KiB coreutils aarch64 9.4-6.fc40 fedora 20.8 MiB cpio aarch64 2.15-1.fc40 fedora 1.2 MiB diffutils aarch64 3.10-5.fc40 fedora 2.1 MiB fedora-release-common noarch 40-38 fedora 19.1 KiB findutils aarch64 1:4.9.0-8.fc40 fedora 1.7 MiB gawk aarch64 5.3.0-3.fc40 fedora 4.2 MiB glibc-minimal-langpack aarch64 2.39.9999-99.fc40 copr_base 0.0 B grep aarch64 3.11-7.fc40 fedora 1.1 MiB gzip aarch64 1.13-1.fc40 fedora 488.8 KiB info aarch64 7.1-2.fc40 fedora 613.5 KiB patch aarch64 2.7.6-24.fc40 fedora 390.5 KiB redhat-rpm-config noarch 286-1.fc40 fedora 185.2 KiB rpm-build aarch64 4.19.1.1-1.fc40 fedora 1.2 MiB sed aarch64 4.9-1.fc40 fedora 1.0 MiB shadow-utils aarch64 2:4.15.1-1.fc40 fedora 7.3 MiB tar aarch64 2:1.35-3.fc40 fedora 3.1 MiB unzip aarch64 6.0-63.fc40 fedora 726.4 KiB util-linux aarch64 2.40-0.9.rc1.fc40 fedora 17.4 MiB which aarch64 2.21-41.fc40 fedora 248.1 KiB xz aarch64 1:5.4.6-3.fc40 fedora 2.3 MiB Installing dependencies: alternatives aarch64 1.26-3.fc40 fedora 218.2 KiB ansible-srpm-macros noarch 1-14.fc40 fedora 35.7 KiB audit-libs aarch64 4.0.1-1.fc40 fedora 547.2 KiB authselect aarch64 1.5.0-5.fc40 fedora 309.4 KiB authselect-libs aarch64 1.5.0-5.fc40 fedora 931.8 KiB basesystem noarch 11-20.fc40 fedora 0.0 B binutils aarch64 2.41-34.fc40 fedora 32.8 MiB binutils-gold aarch64 2.41-34.fc40 fedora 3.1 MiB bzip2-libs aarch64 1.0.8-18.fc40 fedora 200.7 KiB ca-certificates noarch 2023.2.62_v7.0.401-6.fc40 fedora 2.3 MiB coreutils-common aarch64 9.4-6.fc40 fedora 11.4 MiB cracklib aarch64 2.9.11-5.fc40 fedora 934.6 KiB crypto-policies noarch 20240201-2.git9f501f3.fc40 fedora 149.3 KiB curl aarch64 8.6.0-7.fc40 fedora 866.6 KiB cyrus-sasl-lib aarch64 2.1.28-19.fc40 fedora 3.1 MiB debugedit aarch64 5.0-14.fc40 fedora 498.8 KiB dwz aarch64 0.15-6.fc40 fedora 386.7 KiB ed aarch64 1.20.1-1.fc40 fedora 282.4 KiB efi-srpm-macros noarch 5-11.fc40 fedora 40.1 KiB elfutils aarch64 0.191-4.fc40 fedora 5.0 MiB elfutils-debuginfod-client aarch64 0.191-4.fc40 fedora 396.7 KiB elfutils-default-yama-scope noarch 0.191-4.fc40 fedora 1.8 KiB elfutils-libelf aarch64 0.191-4.fc40 fedora 1.3 MiB elfutils-libs aarch64 0.191-4.fc40 fedora 1.0 MiB fedora-gpg-keys noarch 40-1 fedora 125.0 KiB fedora-release noarch 40-38 fedora 0.0 B fedora-release-identity-basic noarch 40-38 fedora 654.0 B fedora-repos noarch 40-1 fedora 4.9 KiB file aarch64 5.45-4.fc40 fedora 267.4 KiB file-libs aarch64 5.45-4.fc40 fedora 10.0 MiB filesystem aarch64 3.18-8.fc40 fedora 106.0 B fonts-srpm-macros noarch 1:2.0.5-14.fc40 fedora 55.3 KiB forge-srpm-macros noarch 0.2.0-3.fc40 fedora 37.4 KiB fpc-srpm-macros noarch 1.3-12.fc40 fedora 144.0 B gdb-minimal aarch64 14.2-1.fc40 fedora 12.7 MiB gdbm aarch64 1:1.23-6.fc40 fedora 928.2 KiB gdbm-libs aarch64 1:1.23-6.fc40 fedora 425.8 KiB ghc-srpm-macros noarch 1.9-1.fc40 fedora 716.0 B glibc aarch64 2.39.9999-99.fc40 copr_base 9.7 MiB glibc-common aarch64 2.39.9999-99.fc40 copr_base 2.6 MiB glibc-gconv-extra aarch64 2.39.9999-99.fc40 copr_base 49.0 MiB gmp aarch64 1:6.2.1-8.fc40 fedora 721.2 KiB gnat-srpm-macros noarch 6-5.fc40 fedora 1.0 KiB go-srpm-macros noarch 3.5.0-1.fc40 fedora 60.6 KiB jansson aarch64 2.13.1-9.fc40 fedora 220.4 KiB kernel-srpm-macros noarch 1.0-23.fc40 fedora 1.9 KiB keyutils-libs aarch64 1.6.3-3.fc40 fedora 226.3 KiB krb5-libs aarch64 1.21.2-5.fc40 fedora 3.4 MiB libacl aarch64 2.3.2-1.fc40 fedora 196.0 KiB libarchive aarch64 3.7.2-3.fc40 fedora 1.0 MiB libattr aarch64 2.5.2-3.fc40 fedora 196.5 KiB libblkid aarch64 2.40-0.9.rc1.fc40 fedora 392.9 KiB libbrotli aarch64 1.1.0-3.fc40 fedora 1.1 MiB libcap aarch64 2.69-3.fc40 fedora 1.4 MiB libcap-ng aarch64 0.8.4-4.fc40 fedora 417.0 KiB libcom_err aarch64 1.47.0-5.fc40 fedora 239.2 KiB libcurl aarch64 8.6.0-7.fc40 fedora 856.6 KiB libeconf aarch64 0.6.2-1.fc40 fedora 206.0 KiB libevent aarch64 2.1.12-12.fc40 fedora 1.5 MiB libfdisk aarch64 2.40-0.9.rc1.fc40 fedora 483.2 KiB libffi aarch64 3.4.4-7.fc40 fedora 281.4 KiB libgcc aarch64 14.0.1-0.13.fc40 fedora 350.1 KiB libgomp aarch64 14.0.1-0.13.fc40 fedora 566.7 KiB libidn2 aarch64 2.3.7-1.fc40 fedora 457.1 KiB libmount aarch64 2.40-0.9.rc1.fc40 fedora 484.2 KiB libnghttp2 aarch64 1.59.0-2.fc40 fedora 262.1 KiB libnsl2 aarch64 2.0.1-1.fc40 fedora 221.9 KiB libpkgconf aarch64 2.1.0-1.fc40 fedora 198.0 KiB libpsl aarch64 0.21.5-3.fc40 fedora 196.5 KiB libpwquality aarch64 1.4.5-9.fc40 fedora 1.1 MiB libselinux aarch64 3.6-4.fc40 fedora 265.1 KiB libsemanage aarch64 3.6-3.fc40 fedora 361.4 KiB libsepol aarch64 3.6-3.fc40 fedora 874.0 KiB libsmartcols aarch64 2.40-0.9.rc1.fc40 fedora 288.8 KiB libssh aarch64 0.10.6-5.fc40 fedora 581.1 KiB libssh-config noarch 0.10.6-5.fc40 fedora 277.0 B libstdc++ aarch64 14.0.1-0.13.fc40 fedora 2.8 MiB libtasn1 aarch64 4.19.0-6.fc40 fedora 283.7 KiB libtirpc aarch64 1.3.4-1.rc3.fc40 fedora 274.6 KiB libtool-ltdl aarch64 2.4.7-10.fc40 fedora 222.2 KiB libunistring aarch64 1.1-7.fc40 fedora 1.9 MiB libutempter aarch64 1.2.1-13.fc40 fedora 417.6 KiB libuuid aarch64 2.40-0.9.rc1.fc40 fedora 197.6 KiB libverto aarch64 0.3.2-8.fc40 fedora 197.4 KiB libxcrypt aarch64 4.4.36-5.fc40 fedora 398.9 KiB libxml2 aarch64 2.12.5-1.fc40 fedora 2.2 MiB libzstd aarch64 1.5.5-5.fc40 fedora 795.8 KiB lua-libs aarch64 5.4.6-5.fc40 fedora 393.0 KiB lua-srpm-macros noarch 1-13.fc40 fedora 1.3 KiB lz4-libs aarch64 1.9.4-6.fc40 fedora 261.4 KiB mpfr aarch64 4.2.1-3.fc40 fedora 818.7 KiB ncurses-base noarch 6.4-12.20240127.fc40 fedora 326.2 KiB ncurses-libs aarch64 6.4-12.20240127.fc40 fedora 2.2 MiB ocaml-srpm-macros noarch 9-3.fc40 fedora 1.9 KiB openblas-srpm-macros noarch 2-16.fc40 fedora 104.0 B openldap aarch64 2.6.7-1.fc40 fedora 1.0 MiB openssl-libs aarch64 1:3.2.1-2.fc40 fedora 7.8 MiB p11-kit aarch64 0.25.3-4.fc40 fedora 2.8 MiB p11-kit-trust aarch64 0.25.3-4.fc40 fedora 655.4 KiB package-notes-srpm-macros noarch 0.5-11.fc40 fedora 1.6 KiB pam aarch64 1.6.0-2.fc40 fedora 11.0 MiB pam-libs aarch64 1.6.0-2.fc40 fedora 606.9 KiB pcre2 aarch64 10.42-2.fc40.2 fedora 905.6 KiB pcre2-syntax noarch 10.42-2.fc40.2 fedora 235.1 KiB perl-srpm-macros noarch 1-53.fc40 fedora 861.0 B pkgconf aarch64 2.1.0-1.fc40 fedora 238.3 KiB pkgconf-m4 noarch 2.1.0-1.fc40 fedora 13.9 KiB pkgconf-pkg-config aarch64 2.1.0-1.fc40 fedora 990.0 B popt aarch64 1.19-6.fc40 fedora 272.8 KiB publicsuffix-list-dafsa noarch 20240107-3.fc40 fedora 67.5 KiB pyproject-srpm-macros noarch 1.12.0-1.fc40 fedora 1.5 KiB python-srpm-macros noarch 3.12-7.fc40 fedora 50.1 KiB qt5-srpm-macros noarch 5.15.13-1.fc40 fedora 492.0 B qt6-srpm-macros noarch 6.6.2-1.fc40 fedora 456.0 B readline aarch64 8.2-8.fc40 fedora 689.1 KiB rpm aarch64 4.19.1.1-1.fc40 fedora 4.0 MiB rpm-build-libs aarch64 4.19.1.1-1.fc40 fedora 262.4 KiB rpm-libs aarch64 4.19.1.1-1.fc40 fedora 861.6 KiB rpm-sequoia aarch64 1.6.0-2.fc40 fedora 2.2 MiB rust-srpm-macros noarch 26.2-1.fc40 fedora 4.8 KiB setup noarch 2.14.5-2.fc40 fedora 720.4 KiB sqlite-libs aarch64 3.45.1-2.fc40 fedora 1.5 MiB systemd-libs aarch64 255.4-1.fc40 fedora 2.5 MiB util-linux-core aarch64 2.40-0.9.rc1.fc40 fedora 6.1 MiB xxhash-libs aarch64 0.8.2-2.fc40 fedora 212.2 KiB xz-libs aarch64 1:5.4.6-3.fc40 fedora 265.6 KiB zig-srpm-macros noarch 1-2.fc40 fedora 1.1 KiB zip aarch64 3.0-40.fc40 fedora 1.1 MiB zlib-ng-compat aarch64 2.1.6-2.fc40 fedora 261.7 KiB zstd aarch64 1.5.5-5.fc40 fedora 1.6 MiB Installing groups: Buildsystem building group Transaction Summary: Installing: 152 packages Total size of inbound packages is 53 MiB. Need to download 53 MiB. After this operation 306 MiB will be used (install 306 MiB, remove 0 B). [ 1/152] bzip2-0:1.0.8-18.fc40.aarch64 100% | 4.2 MiB/s | 52.2 KiB | 00m00s [ 2/152] cpio-0:2.15-1.fc40.aarch64 100% | 71.3 MiB/s | 291.9 KiB | 00m00s [ 3/152] coreutils-0:9.4-6.fc40.aarch6 100% | 54.1 MiB/s | 1.2 MiB | 00m00s [ 4/152] bash-0:5.2.26-3.fc40.aarch64 100% | 66.6 MiB/s | 1.8 MiB | 00m00s [ 5/152] fedora-release-common-0:40-38 100% | 3.5 MiB/s | 21.3 KiB | 00m00s [ 6/152] diffutils-0:3.10-5.fc40.aarch 100% | 28.2 MiB/s | 404.0 KiB | 00m00s [ 7/152] findutils-1:4.9.0-8.fc40.aarc 100% | 97.4 MiB/s | 498.6 KiB | 00m00s [ 8/152] gawk-0:5.3.0-3.fc40.aarch64 100% | 106.0 MiB/s | 1.1 MiB | 00m00s [ 9/152] gzip-0:1.13-1.fc40.aarch64 100% | 23.7 MiB/s | 169.8 KiB | 00m00s [ 10/152] grep-0:3.11-7.fc40.aarch64 100% | 29.2 MiB/s | 298.5 KiB | 00m00s [ 11/152] info-0:7.1-2.fc40.aarch64 100% | 59.6 MiB/s | 183.1 KiB | 00m00s [ 12/152] redhat-rpm-config-0:286-1.fc4 100% | 40.5 MiB/s | 83.0 KiB | 00m00s [ 13/152] patch-0:2.7.6-24.fc40.aarch64 100% | 31.6 MiB/s | 129.5 KiB | 00m00s [ 14/152] rpm-build-0:4.19.1.1-1.fc40.a 100% | 19.5 MiB/s | 79.7 KiB | 00m00s [ 15/152] sed-0:4.9-1.fc40.aarch64 100% | 51.4 MiB/s | 315.7 KiB | 00m00s [ 16/152] shadow-utils-2:4.15.1-1.fc40. 100% | 188.9 MiB/s | 1.3 MiB | 00m00s [ 17/152] tar-2:1.35-3.fc40.aarch64 100% | 104.7 MiB/s | 857.5 KiB | 00m00s [ 18/152] unzip-0:6.0-63.fc40.aarch64 100% | 25.8 MiB/s | 185.0 KiB | 00m00s [ 19/152] util-linux-0:2.40-0.9.rc1.fc4 100% | 122.9 MiB/s | 1.2 MiB | 00m00s [ 20/152] xz-1:5.4.6-3.fc40.aarch64 100% | 68.1 MiB/s | 558.0 KiB | 00m00s [ 21/152] which-0:2.21-41.fc40.aarch64 100% | 3.4 MiB/s | 41.6 KiB | 00m00s [ 22/152] glibc-minimal-langpack-0:2.39 100% | 12.0 MiB/s | 98.7 KiB | 00m00s [ 23/152] ncurses-libs-0:6.4-12.2024012 100% | 64.3 MiB/s | 329.1 KiB | 00m00s [ 24/152] filesystem-0:3.18-8.fc40.aarc 100% | 90.5 MiB/s | 1.1 MiB | 00m00s [ 25/152] bzip2-libs-0:1.0.8-18.fc40.aa 100% | 4.6 MiB/s | 42.7 KiB | 00m00s [ 26/152] gmp-1:6.2.1-8.fc40.aarch64 100% | 43.6 MiB/s | 267.6 KiB | 00m00s [ 27/152] libacl-0:2.3.2-1.fc40.aarch64 100% | 2.4 MiB/s | 24.7 KiB | 00m00s [ 28/152] libcap-0:2.69-3.fc40.aarch64 100% | 10.2 MiB/s | 83.6 KiB | 00m00s [ 29/152] coreutils-common-0:9.4-6.fc40 100% | 76.8 MiB/s | 2.2 MiB | 00m00s [ 30/152] libattr-0:2.5.2-3.fc40.aarch6 100% | 1.0 MiB/s | 18.0 KiB | 00m00s [ 31/152] libselinux-0:3.6-4.fc40.aarch 100% | 21.5 MiB/s | 87.9 KiB | 00m00s [ 32/152] fedora-repos-0:40-1.noarch 100% | 1.5 MiB/s | 9.4 KiB | 00m00s [ 33/152] openssl-libs-1:3.2.1-2.fc40.a 100% | 224.5 MiB/s | 2.2 MiB | 00m00s [ 34/152] mpfr-0:4.2.1-3.fc40.aarch64 100% | 31.7 MiB/s | 324.2 KiB | 00m00s [ 35/152] readline-0:8.2-8.fc40.aarch64 100% | 52.1 MiB/s | 213.5 KiB | 00m00s [ 36/152] pcre2-0:10.42-2.fc40.2.aarch6 100% | 72.1 MiB/s | 221.6 KiB | 00m00s [ 37/152] ed-0:1.20.1-1.fc40.aarch64 100% | 13.2 MiB/s | 81.2 KiB | 00m00s [ 38/152] ansible-srpm-macros-0:1-14.fc 100% | 4.1 MiB/s | 20.8 KiB | 00m00s [ 39/152] dwz-0:0.15-6.fc40.aarch64 100% | 19.1 MiB/s | 136.6 KiB | 00m00s [ 40/152] file-0:5.45-4.fc40.aarch64 100% | 8.1 MiB/s | 49.5 KiB | 00m00s [ 41/152] efi-srpm-macros-0:5-11.fc40.n 100% | 3.1 MiB/s | 22.3 KiB | 00m00s [ 42/152] fonts-srpm-macros-1:2.0.5-14. 100% | 6.5 MiB/s | 26.5 KiB | 00m00s [ 43/152] fpc-srpm-macros-0:1.3-12.fc40 100% | 783.3 KiB/s | 7.8 KiB | 00m00s [ 44/152] ghc-srpm-macros-0:1.9-1.fc40. 100% | 583.3 KiB/s | 8.7 KiB | 00m00s [ 45/152] forge-srpm-macros-0:0.2.0-3.f 100% | 981.8 KiB/s | 18.7 KiB | 00m00s [ 46/152] gnat-srpm-macros-0:6-5.fc40.n 100% | 980.4 KiB/s | 8.8 KiB | 00m00s [ 47/152] kernel-srpm-macros-0:1.0-23.f 100% | 3.2 MiB/s | 9.7 KiB | 00m00s [ 48/152] go-srpm-macros-0:3.5.0-1.fc40 100% | 4.5 MiB/s | 27.5 KiB | 00m00s [ 49/152] lua-srpm-macros-0:1-13.fc40.n 100% | 2.8 MiB/s | 8.7 KiB | 00m00s [ 50/152] openblas-srpm-macros-0:2-16.f 100% | 7.3 MiB/s | 7.5 KiB | 00m00s [ 51/152] ocaml-srpm-macros-0:9-3.fc40. 100% | 4.4 MiB/s | 9.1 KiB | 00m00s [ 52/152] perl-srpm-macros-0:1-53.fc40. 100% | 4.1 MiB/s | 8.4 KiB | 00m00s [ 53/152] package-notes-srpm-macros-0:0 100% | 3.2 MiB/s | 9.9 KiB | 00m00s [ 54/152] pyproject-srpm-macros-0:1.12. 100% | 4.4 MiB/s | 13.6 KiB | 00m00s [ 55/152] qt5-srpm-macros-0:5.15.13-1.f 100% | 8.3 MiB/s | 8.5 KiB | 00m00s [ 56/152] python-srpm-macros-0:3.12-7.f 100% | 5.8 MiB/s | 23.8 KiB | 00m00s [ 57/152] qt6-srpm-macros-0:6.6.2-1.fc4 100% | 4.3 MiB/s | 8.9 KiB | 00m00s [ 58/152] rpm-0:4.19.1.1-1.fc40.aarch64 100% | 131.0 MiB/s | 536.7 KiB | 00m00s [ 59/152] rust-srpm-macros-0:26.2-1.fc4 100% | 4.1 MiB/s | 12.6 KiB | 00m00s [ 60/152] zig-srpm-macros-0:1-2.fc40.no 100% | 3.9 MiB/s | 8.0 KiB | 00m00s [ 61/152] zip-0:3.0-40.fc40.aarch64 100% | 85.7 MiB/s | 263.3 KiB | 00m00s [ 62/152] elfutils-0:0.191-4.fc40.aarch 100% | 50.4 MiB/s | 568.2 KiB | 00m00s [ 63/152] debugedit-0:5.0-14.fc40.aarch 100% | 4.5 MiB/s | 78.5 KiB | 00m00s [ 64/152] pkgconf-pkg-config-0:2.1.0-1. 100% | 2.4 MiB/s | 9.7 KiB | 00m00s [ 65/152] elfutils-libelf-0:0.191-4.fc4 100% | 25.5 MiB/s | 208.9 KiB | 00m00s [ 66/152] binutils-0:2.41-34.fc40.aarch 100% | 199.6 MiB/s | 6.8 MiB | 00m00s [ 67/152] popt-0:1.19-6.fc40.aarch64 100% | 4.1 MiB/s | 66.7 KiB | 00m00s [ 68/152] rpm-build-libs-0:4.19.1.1-1.f 100% | 6.0 MiB/s | 91.8 KiB | 00m00s [ 69/152] rpm-libs-0:4.19.1.1-1.fc40.aa 100% | 42.7 MiB/s | 306.0 KiB | 00m00s [ 70/152] zstd-0:1.5.5-5.fc40.aarch64 100% | 55.4 MiB/s | 453.6 KiB | 00m00s [ 71/152] audit-libs-0:4.0.1-1.fc40.aar 100% | 11.2 MiB/s | 126.1 KiB | 00m00s [ 72/152] libeconf-0:0.6.2-1.fc40.aarch 100% | 2.8 MiB/s | 32.1 KiB | 00m00s [ 73/152] libsemanage-0:3.6-3.fc40.aarc 100% | 14.0 MiB/s | 114.9 KiB | 00m00s [ 74/152] pam-libs-0:1.6.0-2.fc40.aarch 100% | 18.6 MiB/s | 57.2 KiB | 00m00s [ 75/152] libxcrypt-0:4.4.36-5.fc40.aar 100% | 20.1 MiB/s | 123.3 KiB | 00m00s [ 76/152] authselect-libs-0:1.5.0-5.fc4 100% | 71.2 MiB/s | 218.8 KiB | 00m00s [ 77/152] setup-0:2.14.5-2.fc40.noarch 100% | 37.8 MiB/s | 154.7 KiB | 00m00s [ 78/152] libblkid-0:2.40-0.9.rc1.fc40. 100% | 38.2 MiB/s | 117.5 KiB | 00m00s [ 79/152] libcap-ng-0:0.8.4-4.fc40.aarc 100% | 7.9 MiB/s | 32.5 KiB | 00m00s [ 80/152] libmount-0:2.40-0.9.rc1.fc40. 100% | 25.2 MiB/s | 154.9 KiB | 00m00s [ 81/152] libfdisk-0:2.40-0.9.rc1.fc40. 100% | 17.2 MiB/s | 158.3 KiB | 00m00s [ 82/152] libsmartcols-0:2.40-0.9.rc1.f 100% | 16.2 MiB/s | 83.2 KiB | 00m00s [ 83/152] libutempter-0:1.2.1-13.fc40.a 100% | 6.6 MiB/s | 26.8 KiB | 00m00s [ 84/152] libuuid-0:2.40-0.9.rc1.fc40.a 100% | 14.2 MiB/s | 29.1 KiB | 00m00s [ 85/152] pam-0:1.6.0-2.fc40.aarch64 100% | 137.3 MiB/s | 562.5 KiB | 00m00s [ 86/152] systemd-libs-0:255.4-1.fc40.a 100% | 96.8 MiB/s | 694.2 KiB | 00m00s [ 87/152] util-linux-core-0:2.40-0.9.rc 100% | 55.8 MiB/s | 513.9 KiB | 00m00s [ 88/152] zlib-ng-compat-0:2.1.6-2.fc40 100% | 8.1 MiB/s | 66.3 KiB | 00m00s [ 89/152] xz-libs-1:5.4.6-3.fc40.aarch6 100% | 21.1 MiB/s | 108.3 KiB | 00m00s [ 90/152] ncurses-base-0:6.4-12.2024012 100% | 86.7 MiB/s | 88.8 KiB | 00m00s [ 91/152] libgcc-0:14.0.1-0.13.fc40.aar 100% | 46.7 MiB/s | 95.7 KiB | 00m00s [ 92/152] glibc-common-0:2.39.9999-99.f 100% | 40.6 MiB/s | 374.3 KiB | 00m00s [ 93/152] libsepol-0:3.6-3.fc40.aarch64 100% | 45.6 MiB/s | 326.7 KiB | 00m00s [ 94/152] ca-certificates-0:2023.2.62_v 100% | 140.3 MiB/s | 862.1 KiB | 00m00s [ 95/152] crypto-policies-0:20240201-2. 100% | 24.4 MiB/s | 99.8 KiB | 00m00s [ 96/152] fedora-gpg-keys-0:40-1.noarch 100% | 32.2 MiB/s | 132.0 KiB | 00m00s [ 97/152] pcre2-syntax-0:10.42-2.fc40.2 100% | 34.8 MiB/s | 142.7 KiB | 00m00s [ 98/152] file-libs-0:5.45-4.fc40.aarch 100% | 149.1 MiB/s | 763.3 KiB | 00m00s [ 99/152] curl-0:8.6.0-7.fc40.aarch64 100% | 58.8 MiB/s | 301.3 KiB | 00m00s [100/152] libarchive-0:3.7.2-3.fc40.aar 100% | 98.7 MiB/s | 404.4 KiB | 00m00s [101/152] alternatives-0:1.26-3.fc40.aa 100% | 9.5 MiB/s | 38.8 KiB | 00m00s [102/152] glibc-0:2.39.9999-99.fc40.aar 100% | 47.1 MiB/s | 1.8 MiB | 00m00s [103/152] elfutils-debuginfod-client-0: 100% | 7.4 MiB/s | 38.0 KiB | 00m00s [104/152] binutils-gold-0:2.41-34.fc40. 100% | 94.3 MiB/s | 965.9 KiB | 00m00s [105/152] jansson-0:2.13.1-9.fc40.aarch 100% | 5.6 MiB/s | 45.8 KiB | 00m00s [106/152] libstdc++-0:14.0.1-0.13.fc40. 100% | 89.5 MiB/s | 825.1 KiB | 00m00s [107/152] libzstd-0:1.5.5-5.fc40.aarch6 100% | 69.5 MiB/s | 284.7 KiB | 00m00s [108/152] elfutils-libs-0:0.191-4.fc40. 100% | 28.7 MiB/s | 264.4 KiB | 00m00s [109/152] pkgconf-0:2.1.0-1.fc40.aarch6 100% | 10.6 MiB/s | 43.5 KiB | 00m00s [110/152] libgomp-0:14.0.1-0.13.fc40.aa 100% | 81.6 MiB/s | 334.4 KiB | 00m00s [111/152] pkgconf-m4-0:2.1.0-1.fc40.noa 100% | 2.3 MiB/s | 13.9 KiB | 00m00s [112/152] lua-libs-0:5.4.6-5.fc40.aarch 100% | 42.8 MiB/s | 131.5 KiB | 00m00s [113/152] rpm-sequoia-0:1.6.0-2.fc40.aa 100% | 159.6 MiB/s | 817.3 KiB | 00m00s [114/152] lz4-libs-0:1.9.4-6.fc40.aarch 100% | 11.0 MiB/s | 67.6 KiB | 00m00s [115/152] sqlite-libs-0:3.45.1-2.fc40.a 100% | 86.0 MiB/s | 704.9 KiB | 00m00s [116/152] authselect-0:1.5.0-5.fc40.aar 100% | 35.7 MiB/s | 146.2 KiB | 00m00s [117/152] gdbm-1:1.23-6.fc40.aarch64 100% | 50.1 MiB/s | 154.0 KiB | 00m00s [118/152] gdbm-libs-1:1.23-6.fc40.aarch 100% | 18.4 MiB/s | 56.5 KiB | 00m00s [119/152] libnsl2-0:2.0.1-1.fc40.aarch6 100% | 9.7 MiB/s | 29.9 KiB | 00m00s [120/152] libpwquality-0:1.4.5-9.fc40.a 100% | 58.7 MiB/s | 120.3 KiB | 00m00s [121/152] libtirpc-0:1.3.4-1.rc3.fc40.a 100% | 45.9 MiB/s | 94.0 KiB | 00m00s [122/152] basesystem-0:11-20.fc40.noarc 100% | 7.0 MiB/s | 7.2 KiB | 00m00s [123/152] p11-kit-0:0.25.3-4.fc40.aarch 100% | 122.0 MiB/s | 499.7 KiB | 00m00s [124/152] p11-kit-trust-0:0.25.3-4.fc40 100% | 27.7 MiB/s | 141.8 KiB | 00m00s [125/152] elfutils-default-yama-scope-0 100% | 6.6 MiB/s | 13.5 KiB | 00m00s [126/152] libxml2-0:2.12.5-1.fc40.aarch 100% | 111.6 MiB/s | 685.5 KiB | 00m00s [127/152] glibc-gconv-extra-0:2.39.9999 100% | 110.7 MiB/s | 2.0 MiB | 00m00s [128/152] cracklib-0:2.9.11-5.fc40.aarc 100% | 22.9 MiB/s | 94.0 KiB | 00m00s [129/152] libpkgconf-0:2.1.0-1.fc40.aar 100% | 4.2 MiB/s | 38.4 KiB | 00m00s [130/152] libcom_err-0:1.47.0-5.fc40.aa 100% | 12.4 MiB/s | 25.5 KiB | 00m00s [131/152] krb5-libs-0:1.21.2-5.fc40.aar 100% | 150.9 MiB/s | 772.5 KiB | 00m00s [132/152] libffi-0:3.4.4-7.fc40.aarch64 100% | 7.3 MiB/s | 37.5 KiB | 00m00s [133/152] libtasn1-0:4.19.0-6.fc40.aarc 100% | 17.8 MiB/s | 73.1 KiB | 00m00s [134/152] keyutils-libs-0:1.6.3-3.fc40. 100% | 15.4 MiB/s | 31.6 KiB | 00m00s [135/152] fedora-release-0:40-38.noarch 100% | 10.5 MiB/s | 10.8 KiB | 00m00s [136/152] libverto-0:0.3.2-8.fc40.aarch 100% | 10.1 MiB/s | 20.7 KiB | 00m00s [137/152] xxhash-libs-0:0.8.2-2.fc40.aa 100% | 8.4 MiB/s | 34.3 KiB | 00m00s [138/152] fedora-release-identity-basic 100% | 2.3 MiB/s | 11.6 KiB | 00m00s [139/152] gdb-minimal-0:14.2-1.fc40.aar 100% | 238.2 MiB/s | 4.0 MiB | 00m00s [140/152] libcurl-0:8.6.0-7.fc40.aarch6 100% | 24.0 MiB/s | 343.6 KiB | 00m00s [141/152] libbrotli-0:1.1.0-3.fc40.aarc 100% | 24.1 MiB/s | 345.7 KiB | 00m00s [142/152] libidn2-0:2.3.7-1.fc40.aarch6 100% | 23.3 MiB/s | 119.1 KiB | 00m00s [143/152] libnghttp2-0:1.59.0-2.fc40.aa 100% | 37.1 MiB/s | 76.0 KiB | 00m00s [144/152] libpsl-0:0.21.5-3.fc40.aarch6 100% | 20.9 MiB/s | 64.2 KiB | 00m00s [145/152] libssh-0:0.10.6-5.fc40.aarch6 100% | 52.0 MiB/s | 213.2 KiB | 00m00s [146/152] openldap-0:2.6.7-1.fc40.aarch 100% | 61.6 MiB/s | 252.4 KiB | 00m00s [147/152] libssh-config-0:0.10.6-5.fc40 100% | 4.4 MiB/s | 9.0 KiB | 00m00s [148/152] libunistring-0:1.1-7.fc40.aar 100% | 75.8 MiB/s | 543.6 KiB | 00m00s [149/152] publicsuffix-list-dafsa-0:202 100% | 9.5 MiB/s | 58.1 KiB | 00m00s [150/152] libevent-0:2.1.12-12.fc40.aar 100% | 41.5 MiB/s | 255.2 KiB | 00m00s [151/152] cyrus-sasl-lib-0:2.1.28-19.fc 100% | 95.1 MiB/s | 778.7 KiB | 00m00s [152/152] libtool-ltdl-0:2.4.7-10.fc40. 100% | 5.9 MiB/s | 36.3 KiB | 00m00s -------------------------------------------------------------------------------- [152/152] Total 100% | 111.0 MiB/s | 52.8 MiB | 00m00s Running transaction Importing PGP key 0xA15B79CC: Userid : "Fedora (40) " Fingerprint: 115DF9AEF857853EE8445D0A0727707EA15B79CC From : file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-40-primary The key was successfully imported. [ 1/154] Verify package files 100% | 524.0 B/s | 152.0 B | 00m00s >>> Running pre-transaction scriptlet: filesystem-0:3.18-8.fc40.aarch64 >>> Stop pre-transaction scriptlet: filesystem-0:3.18-8.fc40.aarch64 [ 2/154] Prepare transaction 100% | 2.5 KiB/s | 152.0 B | 00m00s [ 3/154] Installing libgcc-0:14.0.1-0. 100% | 171.8 MiB/s | 351.8 KiB | 00m00s >>> Running post-install scriptlet: libgcc-0:14.0.1-0.13.fc40.aarch64 >>> Stop post-install scriptlet: libgcc-0:14.0.1-0.13.fc40.aarch64 [ 4/154] Installing crypto-policies-0: 100% | 22.2 MiB/s | 181.7 KiB | 00m00s >>> Running post-install scriptlet: crypto-policies-0:20240201-2.git9f501f3.fc40 >>> Stop post-install scriptlet: crypto-policies-0:20240201-2.git9f501f3.fc40.no [ 5/154] Installing fedora-release-ide 100% | 890.6 KiB/s | 912.0 B | 00m00s [ 6/154] Installing fedora-gpg-keys-0: 100% | 27.7 MiB/s | 170.1 KiB | 00m00s [ 7/154] Installing fedora-repos-0:40- 100% | 0.0 B/s | 5.7 KiB | 00m00s [ 8/154] Installing fedora-release-com 100% | 22.7 MiB/s | 23.3 KiB | 00m00s [ 9/154] Installing fedora-release-0:4 100% | 0.0 B/s | 124.0 B | 00m00s [ 10/154] Installing setup-0:2.14.5-2.f 100% | 41.7 MiB/s | 725.8 KiB | 00m00s >>> Running post-install scriptlet: setup-0:2.14.5-2.fc40.noarch >>> Stop post-install scriptlet: setup-0:2.14.5-2.fc40.noarch [ 11/154] Installing filesystem-0:3.18- 100% | 2.3 MiB/s | 212.4 KiB | 00m00s [ 12/154] Installing basesystem-0:11-20 100% | 0.0 B/s | 124.0 B | 00m00s [ 13/154] Installing libssh-config-0:0. 100% | 0.0 B/s | 816.0 B | 00m00s [ 14/154] Installing publicsuffix-list- 100% | 66.7 MiB/s | 68.3 KiB | 00m00s [ 15/154] Installing pkgconf-m4-0:2.1.0 100% | 0.0 B/s | 14.3 KiB | 00m00s [ 16/154] Installing pcre2-syntax-0:10. 100% | 116.0 MiB/s | 237.6 KiB | 00m00s [ 17/154] Installing ncurses-base-0:6.4 100% | 57.2 MiB/s | 351.6 KiB | 00m00s [ 18/154] Installing glibc-minimal-lang 100% | 0.0 B/s | 124.0 B | 00m00s [ 19/154] Installing ncurses-libs-0:6.4 100% | 280.9 MiB/s | 2.2 MiB | 00m00s >>> Running pre-install scriptlet: glibc-0:2.39.9999-99.fc40.aarch64 >>> Stop pre-install scriptlet: glibc-0:2.39.9999-99.fc40.aarch64 [ 20/154] Installing glibc-0:2.39.9999- 100% | 249.9 MiB/s | 9.7 MiB | 00m00s >>> Running post-install scriptlet: glibc-0:2.39.9999-99.fc40.aarch64 >>> Stop post-install scriptlet: glibc-0:2.39.9999-99.fc40.aarch64 [ 21/154] Installing bash-0:5.2.26-3.fc 100% | 319.7 MiB/s | 8.3 MiB | 00m00s >>> Running post-install scriptlet: bash-0:5.2.26-3.fc40.aarch64 >>> Stop post-install scriptlet: bash-0:5.2.26-3.fc40.aarch64 [ 22/154] Installing glibc-common-0:2.3 100% | 284.7 MiB/s | 2.6 MiB | 00m00s [ 23/154] Installing glibc-gconv-extra- 100% | 545.0 MiB/s | 49.0 MiB | 00m00s >>> Running post-install scriptlet: glibc-gconv-extra-0:2.39.9999-99.fc40.aarch6 >>> Stop post-install scriptlet: glibc-gconv-extra-0:2.39.9999-99.fc40.aarch64 [ 24/154] Installing zlib-ng-compat-0:2 100% | 128.2 MiB/s | 262.5 KiB | 00m00s [ 25/154] Installing xz-libs-1:5.4.6-3. 100% | 260.5 MiB/s | 266.7 KiB | 00m00s [ 26/154] Installing bzip2-libs-0:1.0.8 100% | 197.0 MiB/s | 201.8 KiB | 00m00s [ 27/154] Installing readline-0:8.2-8.f 100% | 225.0 MiB/s | 691.2 KiB | 00m00s [ 28/154] Installing popt-0:1.19-6.fc40 100% | 68.2 MiB/s | 279.4 KiB | 00m00s [ 29/154] Installing libuuid-0:2.40-0.9 100% | 194.2 MiB/s | 198.9 KiB | 00m00s [ 30/154] Installing libstdc++-0:14.0.1 100% | 277.0 MiB/s | 2.8 MiB | 00m00s [ 31/154] Installing libzstd-0:1.5.5-5. 100% | 259.5 MiB/s | 797.1 KiB | 00m00s [ 32/154] Installing elfutils-libelf-0: 100% | 328.4 MiB/s | 1.3 MiB | 00m00s [ 33/154] Installing libblkid-0:2.40-0. 100% | 192.4 MiB/s | 394.0 KiB | 00m00s [ 34/154] Installing gmp-1:6.2.1-8.fc40 100% | 235.5 MiB/s | 723.4 KiB | 00m00s [ 35/154] Installing libattr-0:2.5.2-3. 100% | 192.8 MiB/s | 197.4 KiB | 00m00s [ 36/154] Installing libacl-0:2.3.2-1.f 100% | 192.2 MiB/s | 196.8 KiB | 00m00s [ 37/154] Installing libxcrypt-0:4.4.36 100% | 196.1 MiB/s | 401.6 KiB | 00m00s [ 38/154] Installing libeconf-0:0.6.2-1 100% | 202.8 MiB/s | 207.6 KiB | 00m00s [ 39/154] Installing lz4-libs-0:1.9.4-6 100% | 256.3 MiB/s | 262.5 KiB | 00m00s [ 40/154] Installing gdbm-libs-1:1.23-6 100% | 208.7 MiB/s | 427.5 KiB | 00m00s [ 41/154] Installing mpfr-0:4.2.1-3.fc4 100% | 267.0 MiB/s | 820.2 KiB | 00m00s [ 42/154] Installing gawk-0:5.3.0-3.fc4 100% | 387.5 MiB/s | 4.3 MiB | 00m00s [ 43/154] Installing dwz-0:0.15-6.fc40. 100% | 189.5 MiB/s | 388.1 KiB | 00m00s [ 44/154] Installing unzip-0:6.0-63.fc4 100% | 237.6 MiB/s | 729.8 KiB | 00m00s [ 45/154] Installing file-libs-0:5.45-4 100% | 589.6 MiB/s | 10.0 MiB | 00m00s [ 46/154] Installing file-0:5.45-4.fc40 100% | 262.6 MiB/s | 268.9 KiB | 00m00s [ 47/154] Installing pcre2-0:10.42-2.fc 100% | 295.2 MiB/s | 907.0 KiB | 00m00s [ 48/154] Installing grep-0:3.11-7.fc40 100% | 156.7 MiB/s | 1.1 MiB | 00m00s [ 49/154] Installing xz-1:5.4.6-3.fc40. 100% | 207.6 MiB/s | 2.3 MiB | 00m00s [ 50/154] Installing libcap-ng-0:0.8.4- 100% | 204.5 MiB/s | 418.9 KiB | 00m00s [ 51/154] Installing audit-libs-0:4.0.1 100% | 268.2 MiB/s | 549.3 KiB | 00m00s [ 52/154] Installing pam-libs-0:1.6.0-2 100% | 297.5 MiB/s | 609.4 KiB | 00m00s [ 53/154] Installing libcap-0:2.69-3.fc 100% | 343.0 MiB/s | 1.4 MiB | 00m00s [ 54/154] Installing systemd-libs-0:255 100% | 307.0 MiB/s | 2.5 MiB | 00m00s [ 55/154] Installing libsmartcols-0:2.4 100% | 141.6 MiB/s | 290.1 KiB | 00m00s [ 56/154] Installing libsepol-0:3.6-3.f 100% | 284.8 MiB/s | 874.9 KiB | 00m00s [ 57/154] Installing libselinux-0:3.6-4 100% | 130.0 MiB/s | 266.3 KiB | 00m00s [ 58/154] Installing sed-0:4.9-1.fc40.a 100% | 164.3 MiB/s | 1.0 MiB | 00m00s [ 59/154] Installing findutils-1:4.9.0- 100% | 207.6 MiB/s | 1.7 MiB | 00m00s [ 60/154] Installing libmount-0:2.40-0. 100% | 237.0 MiB/s | 485.4 KiB | 00m00s [ 61/154] Installing alternatives-0:1.2 100% | 214.7 MiB/s | 219.9 KiB | 00m00s [ 62/154] Installing jansson-0:2.13.1-9 100% | 216.5 MiB/s | 221.7 KiB | 00m00s [ 63/154] Installing lua-libs-0:5.4.6-5 100% | 192.5 MiB/s | 394.2 KiB | 00m00s [ 64/154] Installing libcom_err-0:1.47. 100% | 234.7 MiB/s | 240.3 KiB | 00m00s [ 65/154] Installing libtasn1-0:4.19.0- 100% | 139.4 MiB/s | 285.5 KiB | 00m00s [ 66/154] Installing libunistring-0:1.1 100% | 311.9 MiB/s | 1.9 MiB | 00m00s [ 67/154] Installing libidn2-0:2.3.7-1. 100% | 113.0 MiB/s | 463.0 KiB | 00m00s [ 68/154] Installing libpsl-0:0.21.5-3. 100% | 193.0 MiB/s | 197.6 KiB | 00m00s [ 69/154] Installing util-linux-core-0: 100% | 435.7 MiB/s | 6.1 MiB | 00m00s [ 70/154] Installing tar-2:1.35-3.fc40. 100% | 278.8 MiB/s | 3.1 MiB | 00m00s [ 71/154] Installing libsemanage-0:3.6- 100% | 118.2 MiB/s | 363.2 KiB | 00m00s [ 72/154] Installing shadow-utils-2:4.1 100% | 171.6 MiB/s | 7.4 MiB | 00m00s >>> Running pre-install scriptlet: libutempter-0:1.2.1-13.fc40.aarch64 >>> Stop pre-install scriptlet: libutempter-0:1.2.1-13.fc40.aarch64 [ 73/154] Installing libutempter-0:1.2. 100% | 204.9 MiB/s | 419.6 KiB | 00m00s [ 74/154] Installing zip-0:3.0-40.fc40. 100% | 281.0 MiB/s | 1.1 MiB | 00m00s [ 75/154] Installing gdbm-1:1.23-6.fc40 100% | 227.8 MiB/s | 933.2 KiB | 00m00s [ 76/154] Installing cyrus-sasl-lib-0:2 100% | 310.7 MiB/s | 3.1 MiB | 00m00s [ 77/154] Installing zstd-0:1.5.5-5.fc4 100% | 312.0 MiB/s | 1.6 MiB | 00m00s [ 78/154] Installing libfdisk-0:2.40-0. 100% | 236.6 MiB/s | 484.5 KiB | 00m00s [ 79/154] Installing bzip2-0:1.0.8-18.f 100% | 210.9 MiB/s | 432.0 KiB | 00m00s [ 80/154] Installing libxml2-0:2.12.5-1 100% | 314.8 MiB/s | 2.2 MiB | 00m00s [ 81/154] Installing sqlite-libs-0:3.45 100% | 249.3 MiB/s | 1.5 MiB | 00m00s [ 82/154] Installing ed-0:1.20.1-1.fc40 100% | 139.0 MiB/s | 284.7 KiB | 00m00s [ 83/154] Installing patch-0:2.7.6-24.f 100% | 191.4 MiB/s | 392.0 KiB | 00m00s [ 84/154] Installing elfutils-default-y 100% | 291.9 KiB/s | 2.0 KiB | 00m00s >>> Running post-install scriptlet: elfutils-default-yama-scope-0:0.191-4.fc40.n >>> Stop post-install scriptlet: elfutils-default-yama-scope-0:0.191-4.fc40.noar [ 85/154] Installing cpio-0:2.15-1.fc40 100% | 174.4 MiB/s | 1.2 MiB | 00m00s [ 86/154] Installing diffutils-0:3.10-5 100% | 263.6 MiB/s | 2.1 MiB | 00m00s [ 87/154] Installing libgomp-0:14.0.1-0 100% | 277.4 MiB/s | 568.1 KiB | 00m00s [ 88/154] Installing libpkgconf-0:2.1.0 100% | 194.5 MiB/s | 199.1 KiB | 00m00s [ 89/154] Installing pkgconf-0:2.1.0-1. 100% | 117.6 MiB/s | 240.8 KiB | 00m00s [ 90/154] Installing pkgconf-pkg-config 100% | 1.7 MiB/s | 1.8 KiB | 00m00s [ 91/154] Installing libffi-0:3.4.4-7.f 100% | 276.2 MiB/s | 282.8 KiB | 00m00s [ 92/154] Installing p11-kit-0:0.25.3-4 100% | 217.8 MiB/s | 2.8 MiB | 00m00s [ 93/154] Installing p11-kit-trust-0:0. 100% | 64.2 MiB/s | 657.2 KiB | 00m00s >>> Running post-install scriptlet: p11-kit-trust-0:0.25.3-4.fc40.aarch64 >>> Stop post-install scriptlet: p11-kit-trust-0:0.25.3-4.fc40.aarch64 [ 94/154] Installing keyutils-libs-0:1. 100% | 111.2 MiB/s | 227.8 KiB | 00m00s [ 95/154] Installing libverto-0:0.3.2-8 100% | 194.6 MiB/s | 199.2 KiB | 00m00s [ 96/154] Installing xxhash-libs-0:0.8. 100% | 208.6 MiB/s | 213.6 KiB | 00m00s [ 97/154] Installing libbrotli-0:1.1.0- 100% | 285.1 MiB/s | 1.1 MiB | 00m00s [ 98/154] Installing libnghttp2-0:1.59. 100% | 257.0 MiB/s | 263.2 KiB | 00m00s [ 99/154] Installing libtool-ltdl-0:2.4 100% | 218.0 MiB/s | 223.3 KiB | 00m00s [100/154] Installing rust-srpm-macros-0 100% | 5.4 MiB/s | 5.6 KiB | 00m00s [101/154] Installing qt6-srpm-macros-0: 100% | 0.0 B/s | 732.0 B | 00m00s [102/154] Installing qt5-srpm-macros-0: 100% | 0.0 B/s | 768.0 B | 00m00s [103/154] Installing perl-srpm-macros-0 100% | 0.0 B/s | 1.1 KiB | 00m00s [104/154] Installing package-notes-srpm 100% | 0.0 B/s | 2.0 KiB | 00m00s [105/154] Installing openblas-srpm-macr 100% | 0.0 B/s | 384.0 B | 00m00s [106/154] Installing ocaml-srpm-macros- 100% | 0.0 B/s | 2.2 KiB | 00m00s [107/154] Installing kernel-srpm-macros 100% | 0.0 B/s | 2.3 KiB | 00m00s [108/154] Installing gnat-srpm-macros-0 100% | 0.0 B/s | 1.3 KiB | 00m00s [109/154] Installing ghc-srpm-macros-0: 100% | 0.0 B/s | 992.0 B | 00m00s [110/154] Installing fpc-srpm-macros-0: 100% | 0.0 B/s | 420.0 B | 00m00s [111/154] Installing ansible-srpm-macro 100% | 35.4 MiB/s | 36.2 KiB | 00m00s [112/154] Installing coreutils-common-0 100% | 301.7 MiB/s | 11.5 MiB | 00m00s [113/154] Installing openssl-libs-1:3.2 100% | 337.9 MiB/s | 7.8 MiB | 00m00s [114/154] Installing coreutils-0:9.4-6. 100% | 452.0 MiB/s | 20.8 MiB | 00m00s >>> Running pre-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40.n >>> Stop pre-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40.noar [115/154] Installing ca-certificates-0: 100% | 2.4 MiB/s | 2.3 MiB | 00m01s >>> Running post-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40. >>> Stop post-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40.noa [116/154] Installing krb5-libs-0:1.21.2 100% | 262.0 MiB/s | 3.4 MiB | 00m00s [117/154] Installing libtirpc-0:1.3.4-1 100% | 135.0 MiB/s | 276.4 KiB | 00m00s [118/154] Installing gzip-0:1.13-1.fc40 100% | 160.9 MiB/s | 494.3 KiB | 00m00s [119/154] Installing authselect-libs-0: 100% | 132.1 MiB/s | 946.7 KiB | 00m00s [120/154] Installing libarchive-0:3.7.2 100% | 254.0 MiB/s | 1.0 MiB | 00m00s [121/154] Installing authselect-0:1.5.0 100% | 102.1 MiB/s | 313.8 KiB | 00m00s [122/154] Installing cracklib-0:2.9.11- 100% | 154.0 MiB/s | 946.0 KiB | 00m00s [123/154] Installing libpwquality-0:1.4 100% | 158.2 MiB/s | 1.1 MiB | 00m00s [124/154] Installing libnsl2-0:2.0.1-1. 100% | 108.9 MiB/s | 223.0 KiB | 00m00s [125/154] Installing pam-0:1.6.0-2.fc40 100% | 380.3 MiB/s | 11.0 MiB | 00m00s [126/154] Installing libssh-0:0.10.6-5. 100% | 189.9 MiB/s | 583.2 KiB | 00m00s [127/154] Installing rpm-sequoia-0:1.6. 100% | 318.5 MiB/s | 2.2 MiB | 00m00s [128/154] Installing rpm-libs-0:4.19.1. 100% | 281.0 MiB/s | 863.2 KiB | 00m00s [129/154] Installing libevent-0:2.1.12- 100% | 304.6 MiB/s | 1.5 MiB | 00m00s [130/154] Installing openldap-0:2.6.7-1 100% | 248.7 MiB/s | 1.0 MiB | 00m00s [131/154] Installing libcurl-0:8.6.0-7. 100% | 279.2 MiB/s | 857.7 KiB | 00m00s [132/154] Installing elfutils-debuginfo 100% | 194.7 MiB/s | 398.7 KiB | 00m00s [133/154] Installing elfutils-libs-0:0. 100% | 245.1 MiB/s | 1.0 MiB | 00m00s [134/154] Installing binutils-gold-0:2. 100% | 180.8 MiB/s | 3.1 MiB | 00m00s >>> Running post-install scriptlet: binutils-gold-0:2.41-34.fc40.aarch64 >>> Stop post-install scriptlet: binutils-gold-0:2.41-34.fc40.aarch64 [135/154] Installing binutils-0:2.41-34 100% | 341.7 MiB/s | 32.8 MiB | 00m00s >>> Running post-install scriptlet: binutils-0:2.41-34.fc40.aarch64 >>> Stop post-install scriptlet: binutils-0:2.41-34.fc40.aarch64 [136/154] Installing elfutils-0:0.191-4 100% | 358.3 MiB/s | 5.0 MiB | 00m00s [137/154] Installing gdb-minimal-0:14.2 100% | 342.1 MiB/s | 12.7 MiB | 00m00s [138/154] Installing debugedit-0:5.0-14 100% | 244.9 MiB/s | 501.5 KiB | 00m00s [139/154] Installing rpm-build-libs-0:4 100% | 128.5 MiB/s | 263.2 KiB | 00m00s [140/154] Installing curl-0:8.6.0-7.fc4 100% | 60.6 MiB/s | 868.9 KiB | 00m00s >>> Running pre-install scriptlet: rpm-0:4.19.1.1-1.fc40.aarch64 >>> Stop pre-install scriptlet: rpm-0:4.19.1.1-1.fc40.aarch64 [141/154] Installing rpm-0:4.19.1.1-1.f 100% | 149.3 MiB/s | 3.4 MiB | 00m00s [142/154] Installing efi-srpm-macros-0: 100% | 40.2 MiB/s | 41.2 KiB | 00m00s [143/154] Installing lua-srpm-macros-0: 100% | 0.0 B/s | 1.9 KiB | 00m00s [144/154] Installing zig-srpm-macros-0: 100% | 0.0 B/s | 1.7 KiB | 00m00s [145/154] Installing fonts-srpm-macros- 100% | 55.1 MiB/s | 56.5 KiB | 00m00s [146/154] Installing forge-srpm-macros- 100% | 37.7 MiB/s | 38.6 KiB | 00m00s [147/154] Installing go-srpm-macros-0:3 100% | 60.2 MiB/s | 61.6 KiB | 00m00s [148/154] Installing python-srpm-macros 100% | 50.1 MiB/s | 51.3 KiB | 00m00s [149/154] Installing redhat-rpm-config- 100% | 62.4 MiB/s | 191.7 KiB | 00m00s [150/154] Installing rpm-build-0:4.19.1 100% | 301.1 MiB/s | 1.2 MiB | 00m00s [151/154] Installing pyproject-srpm-mac 100% | 1.0 MiB/s | 2.1 KiB | 00m00s [152/154] Installing util-linux-0:2.40- 100% | 364.8 MiB/s | 17.5 MiB | 00m00s >>> Running post-install scriptlet: util-linux-0:2.40-0.9.rc1.fc40.aarch64 >>> Stop post-install scriptlet: util-linux-0:2.40-0.9.rc1.fc40.aarch64 [153/154] Installing which-0:2.21-41.fc 100% | 122.2 MiB/s | 250.3 KiB | 00m00s [154/154] Installing info-0:7.1-2.fc40. 100% | 480.0 KiB/s | 613.9 KiB | 00m01s >>> Running post-transaction scriptlet: filesystem-0:3.18-8.fc40.aarch64 >>> Stop post-transaction scriptlet: filesystem-0:3.18-8.fc40.aarch64 >>> Running post-transaction scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.f >>> Stop post-transaction scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40 >>> Running post-transaction scriptlet: authselect-libs-0:1.5.0-5.fc40.aarch64 >>> Stop post-transaction scriptlet: authselect-libs-0:1.5.0-5.fc40.aarch64 >>> Running post-transaction scriptlet: rpm-0:4.19.1.1-1.fc40.aarch64 >>> Stop post-transaction scriptlet: rpm-0:4.19.1.1-1.fc40.aarch64 >>> Running trigger-install scriptlet: glibc-common-0:2.39.9999-99.fc40.aarch64 >>> Stop trigger-install scriptlet: glibc-common-0:2.39.9999-99.fc40.aarch64 >>> Running trigger-install scriptlet: info-0:7.1-2.fc40.aarch64 >>> Stop trigger-install scriptlet: info-0:7.1-2.fc40.aarch64 Warning: skipped PGP checks for 4 package(s). Finish: installing minimal buildroot with dnf5 Start: creating root cache Finish: creating root cache Finish: chroot init INFO: Installed packages: INFO: alternatives-1.26-3.fc40.aarch64 ansible-srpm-macros-1-14.fc40.noarch audit-libs-4.0.1-1.fc40.aarch64 authselect-1.5.0-5.fc40.aarch64 authselect-libs-1.5.0-5.fc40.aarch64 basesystem-11-20.fc40.noarch bash-5.2.26-3.fc40.aarch64 binutils-2.41-34.fc40.aarch64 binutils-gold-2.41-34.fc40.aarch64 bzip2-1.0.8-18.fc40.aarch64 bzip2-libs-1.0.8-18.fc40.aarch64 ca-certificates-2023.2.62_v7.0.401-6.fc40.noarch coreutils-9.4-6.fc40.aarch64 coreutils-common-9.4-6.fc40.aarch64 cpio-2.15-1.fc40.aarch64 cracklib-2.9.11-5.fc40.aarch64 crypto-policies-20240201-2.git9f501f3.fc40.noarch curl-8.6.0-7.fc40.aarch64 cyrus-sasl-lib-2.1.28-19.fc40.aarch64 debugedit-5.0-14.fc40.aarch64 diffutils-3.10-5.fc40.aarch64 dwz-0.15-6.fc40.aarch64 ed-1.20.1-1.fc40.aarch64 efi-srpm-macros-5-11.fc40.noarch elfutils-0.191-4.fc40.aarch64 elfutils-debuginfod-client-0.191-4.fc40.aarch64 elfutils-default-yama-scope-0.191-4.fc40.noarch elfutils-libelf-0.191-4.fc40.aarch64 elfutils-libs-0.191-4.fc40.aarch64 fedora-gpg-keys-40-1.noarch fedora-release-40-38.noarch fedora-release-common-40-38.noarch fedora-release-identity-basic-40-38.noarch fedora-repos-40-1.noarch file-5.45-4.fc40.aarch64 file-libs-5.45-4.fc40.aarch64 filesystem-3.18-8.fc40.aarch64 findutils-4.9.0-8.fc40.aarch64 fonts-srpm-macros-2.0.5-14.fc40.noarch forge-srpm-macros-0.2.0-3.fc40.noarch fpc-srpm-macros-1.3-12.fc40.noarch gawk-5.3.0-3.fc40.aarch64 gdb-minimal-14.2-1.fc40.aarch64 gdbm-1.23-6.fc40.aarch64 gdbm-libs-1.23-6.fc40.aarch64 ghc-srpm-macros-1.9-1.fc40.noarch glibc-2.39.9999-99.fc40.aarch64 glibc-common-2.39.9999-99.fc40.aarch64 glibc-gconv-extra-2.39.9999-99.fc40.aarch64 glibc-minimal-langpack-2.39.9999-99.fc40.aarch64 gmp-6.2.1-8.fc40.aarch64 gnat-srpm-macros-6-5.fc40.noarch go-srpm-macros-3.5.0-1.fc40.noarch gpg-pubkey-a15b79cc-63d04c2c grep-3.11-7.fc40.aarch64 gzip-1.13-1.fc40.aarch64 info-7.1-2.fc40.aarch64 jansson-2.13.1-9.fc40.aarch64 kernel-srpm-macros-1.0-23.fc40.noarch keyutils-libs-1.6.3-3.fc40.aarch64 krb5-libs-1.21.2-5.fc40.aarch64 libacl-2.3.2-1.fc40.aarch64 libarchive-3.7.2-3.fc40.aarch64 libattr-2.5.2-3.fc40.aarch64 libblkid-2.40-0.9.rc1.fc40.aarch64 libbrotli-1.1.0-3.fc40.aarch64 libcap-2.69-3.fc40.aarch64 libcap-ng-0.8.4-4.fc40.aarch64 libcom_err-1.47.0-5.fc40.aarch64 libcurl-8.6.0-7.fc40.aarch64 libeconf-0.6.2-1.fc40.aarch64 libevent-2.1.12-12.fc40.aarch64 libfdisk-2.40-0.9.rc1.fc40.aarch64 libffi-3.4.4-7.fc40.aarch64 libgcc-14.0.1-0.13.fc40.aarch64 libgomp-14.0.1-0.13.fc40.aarch64 libidn2-2.3.7-1.fc40.aarch64 libmount-2.40-0.9.rc1.fc40.aarch64 libnghttp2-1.59.0-2.fc40.aarch64 libnsl2-2.0.1-1.fc40.aarch64 libpkgconf-2.1.0-1.fc40.aarch64 libpsl-0.21.5-3.fc40.aarch64 libpwquality-1.4.5-9.fc40.aarch64 libselinux-3.6-4.fc40.aarch64 libsemanage-3.6-3.fc40.aarch64 libsepol-3.6-3.fc40.aarch64 libsmartcols-2.40-0.9.rc1.fc40.aarch64 libssh-0.10.6-5.fc40.aarch64 libssh-config-0.10.6-5.fc40.noarch libstdc++-14.0.1-0.13.fc40.aarch64 libtasn1-4.19.0-6.fc40.aarch64 libtirpc-1.3.4-1.rc3.fc40.aarch64 libtool-ltdl-2.4.7-10.fc40.aarch64 libunistring-1.1-7.fc40.aarch64 libutempter-1.2.1-13.fc40.aarch64 libuuid-2.40-0.9.rc1.fc40.aarch64 libverto-0.3.2-8.fc40.aarch64 libxcrypt-4.4.36-5.fc40.aarch64 libxml2-2.12.5-1.fc40.aarch64 libzstd-1.5.5-5.fc40.aarch64 lua-libs-5.4.6-5.fc40.aarch64 lua-srpm-macros-1-13.fc40.noarch lz4-libs-1.9.4-6.fc40.aarch64 mpfr-4.2.1-3.fc40.aarch64 ncurses-base-6.4-12.20240127.fc40.noarch ncurses-libs-6.4-12.20240127.fc40.aarch64 ocaml-srpm-macros-9-3.fc40.noarch openblas-srpm-macros-2-16.fc40.noarch openldap-2.6.7-1.fc40.aarch64 openssl-libs-3.2.1-2.fc40.aarch64 p11-kit-0.25.3-4.fc40.aarch64 p11-kit-trust-0.25.3-4.fc40.aarch64 package-notes-srpm-macros-0.5-11.fc40.noarch pam-1.6.0-2.fc40.aarch64 pam-libs-1.6.0-2.fc40.aarch64 patch-2.7.6-24.fc40.aarch64 pcre2-10.42-2.fc40.2.aarch64 pcre2-syntax-10.42-2.fc40.2.noarch perl-srpm-macros-1-53.fc40.noarch pkgconf-2.1.0-1.fc40.aarch64 pkgconf-m4-2.1.0-1.fc40.noarch pkgconf-pkg-config-2.1.0-1.fc40.aarch64 popt-1.19-6.fc40.aarch64 publicsuffix-list-dafsa-20240107-3.fc40.noarch pyproject-srpm-macros-1.12.0-1.fc40.noarch python-srpm-macros-3.12-7.fc40.noarch qt5-srpm-macros-5.15.13-1.fc40.noarch qt6-srpm-macros-6.6.2-1.fc40.noarch readline-8.2-8.fc40.aarch64 redhat-rpm-config-286-1.fc40.noarch rpm-4.19.1.1-1.fc40.aarch64 rpm-build-4.19.1.1-1.fc40.aarch64 rpm-build-libs-4.19.1.1-1.fc40.aarch64 rpm-libs-4.19.1.1-1.fc40.aarch64 rpm-sequoia-1.6.0-2.fc40.aarch64 rust-srpm-macros-26.2-1.fc40.noarch sed-4.9-1.fc40.aarch64 setup-2.14.5-2.fc40.noarch shadow-utils-4.15.1-1.fc40.aarch64 sqlite-libs-3.45.1-2.fc40.aarch64 systemd-libs-255.4-1.fc40.aarch64 tar-1.35-3.fc40.aarch64 unzip-6.0-63.fc40.aarch64 util-linux-2.40-0.9.rc1.fc40.aarch64 util-linux-core-2.40-0.9.rc1.fc40.aarch64 which-2.21-41.fc40.aarch64 xxhash-libs-0.8.2-2.fc40.aarch64 xz-5.4.6-3.fc40.aarch64 xz-libs-5.4.6-3.fc40.aarch64 zig-srpm-macros-1-2.fc40.noarch zip-3.0-40.fc40.aarch64 zlib-ng-compat-2.1.6-2.fc40.aarch64 zstd-1.5.5-5.fc40.aarch64 Start: buildsrpm Start: rpmbuild -bs warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: aarch64 Building for target aarch64 setting SOURCE_DATE_EPOCH=1554595200 Wrote: /builddir/build/SRPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Finish: rpmbuild -bs cp: preserving permissions for ‘/var/lib/copr-rpmbuild/results/chroot_scan/var/lib/mock/fedora-40-aarch64-1712885791.289313/root/var/log’: No such file or directory INFO: chroot_scan: 1 files copied to /var/lib/copr-rpmbuild/results/chroot_scan INFO: /var/lib/mock/fedora-40-aarch64-1712885791.289313/root/var/log/dnf5.log Finish: buildsrpm INFO: Done(/var/lib/copr-rpmbuild/workspace/workdir-zwbyh1gj/pytorch/pytorch.spec) Config(child) 0 minutes 31 seconds INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results INFO: Cleaning up build root ('cleanup_on_success=True') Start: clean chroot INFO: unmounting tmpfs. Finish: clean chroot INFO: Start(/var/lib/copr-rpmbuild/results/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm) Config(fedora-40-aarch64) Start(bootstrap): chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-40-aarch64-bootstrap-1712885791.289313/root. INFO: reusing tmpfs at /var/lib/mock/fedora-40-aarch64-bootstrap-1712885791.289313/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start(bootstrap): cleaning package manager metadata Finish(bootstrap): cleaning package manager metadata Finish(bootstrap): chroot init Start: chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-40-aarch64-1712885791.289313/root. INFO: calling preinit hooks INFO: enabled root cache Start: unpacking root cache Finish: unpacking root cache INFO: enabled package manager cache Start: cleaning package manager metadata Finish: cleaning package manager metadata INFO: enabled HW Info plugin INFO: Buildroot is handled by package management downloaded with a bootstrap image: rpm-4.19.1.1-1.fc40.aarch64 rpm-sequoia-1.6.0-2.fc40.aarch64 python3-dnf-4.19.0-1.fc40.noarch yum-4.19.0-1.fc40.noarch dnf5-5.1.15-1.fc40.aarch64 dnf5-plugins-5.1.15-1.fc40.aarch64 Finish: chroot init Start: build phase for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm Start: build setup for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: aarch64 Building for target aarch64 setting SOURCE_DATE_EPOCH=1554595200 Wrote: /builddir/build/SRPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Updating and loading repositories: fedora 100% | 48.1 KiB/s | 12.9 KiB | 00m00s updates 100% | 165.0 KiB/s | 13.2 KiB | 00m00s Copr repository 100% | 117.6 KiB/s | 1.5 KiB | 00m00s Additional repo copr_rezso_CUDA 100% | 108.7 KiB/s | 1.5 KiB | 00m00s Additional repo http_developer_downloa 100% | 1.1 MiB/s | 3.5 KiB | 00m00s Additional repo http_developer_downloa 100% | 1.1 MiB/s | 3.5 KiB | 00m00s Additional repo http_developer_downloa 100% | 1.1 MiB/s | 3.5 KiB | 00m00s Copr repository 100% | 2.7 MiB/s | 158.8 KiB | 00m00s Repositories loaded. Package Arch Version Repository Size Installing: asmjit-devel aarch64 1:0-20220702.1.gitc5984762.fc40 copr_base 1.5 MiB cpuinfo-devel aarch64 1:0-20240327.0.gitf42f5eaf.fc40 copr_base 79.6 KiB cuda-cudart-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 6.4 MiB cuda-cupti-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 50.3 MiB cuda-driver-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 121.2 KiB cuda-gcc-12-c++ aarch64 12.3.1-1.fc39 copr_base 57.2 MiB cuda-nvcc-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 171.7 MiB cuda-nvml-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 664.8 KiB cuda-nvrtc-devel-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 72.8 MiB cuda-nvtx-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 408.5 KiB cuda-profiler-api-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 71.4 KiB cutlass-devel aarch64 3.4.1-20240215.0.cu12_3.fc40 copr_base 12.1 MiB doxygen aarch64 2:1.10.0-3.fc40 fedora 19.5 MiB eigen3-devel noarch 3.4.0-15.fc40 fedora 8.4 MiB fftw-devel aarch64 3.3.10-11.fc40 fedora 282.6 KiB flatbuffers-compiler aarch64 23.5.26-6.fc40 fedora 2.5 MiB flatbuffers-devel aarch64 23.5.26-6.fc40 fedora 464.7 KiB foxi-devel aarch64 0-20210526.1.gitc278588e.fc37 copr_base 120.6 KiB fp16-devel aarch64 1:0-20240410.0.git581ac1c7.fc40 copr_base 30.4 KiB fxdiv-devel noarch 1:0-20201208.1.git63058eff.fc40 copr_base 16.9 KiB gcc-c++ aarch64 14.0.1-0.13.fc40 fedora 35.0 MiB gemmlowp-devel noarch 0-20231104.0.git16e8662c.fc40 copr_base 2.3 MiB gflags-devel aarch64 2.2.2-14.fc40 fedora 62.3 KiB git aarch64 2.44.0-1.fc40 fedora 85.2 KiB glog-devel aarch64 0.3.5-20.fc40 fedora 112.0 KiB gloo-devel aarch64 1:0.5.0-20240411.0.git6c70a556.cu12_3.fc40 copr_base 328.5 KiB gmp-devel aarch64 1:6.2.1-8.fc40 fedora 356.4 KiB hiredis-devel aarch64 1.0.2-7.fc40 fedora 118.4 KiB kineto-devel aarch64 0.4.0-20240327.0.git445909a8.cu12_3.fc40 copr_base 49.6 KiB leveldb-devel aarch64 1.23-9.fc40 fedora 137.6 KiB libcublas-devel-12-3 aarch64 12.3.4.1-2 copr_rezso_CUDA 726.7 KiB libcudnn8-devel aarch64 8.9.7.29-2.cuda12.3 copr_rezso_CUDA 199.2 KiB libcufft-devel-12-3 aarch64 11.0.12.1-2 copr_rezso_CUDA 130.1 KiB libcurand-devel-12-3 aarch64 10.3.4.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 93.8 MiB libcusolver-devel-12-3 aarch64 11.5.4.101-2 copr_rezso_CUDA 461.0 KiB libcusparse-devel-12-3 aarch64 12.2.0.103-2 copr_rezso_CUDA 252.1 MiB libnccl-devel aarch64 2.21.5-1+cuda12.4 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 45.3 KiB libnvjitlink-devel-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 55.5 MiB libuv-devel aarch64 1:1.48.0-1.fc40 fedora 206.1 KiB libzstd-devel aarch64 1.5.5-5.fc40 fedora 198.1 KiB lmdb-devel aarch64 0.9.32-1.fc40 fedora 72.5 KiB magma-devel aarch64 2.8.0-20240328.0.cu12_3.fc40 copr_base 21.8 MiB mesa-libGLU-devel aarch64 9.0.3-4.fc40 fedora 17.0 KiB miniz-devel aarch64 3.0.2-5.fc40 fedora 102.7 KiB mpfr-devel aarch64 4.2.1-3.fc40 fedora 62.8 KiB neon2sse-devel noarch 0-20230131.0.git097a5eca.fc38 copr_base 802.0 KiB nnpack-devel aarch64 0-20230201.0.git70a77f48.fc38 copr_base 42.7 KiB numactl-devel aarch64 2.0.16-5.fc40 fedora 25.9 KiB ocl-icd-devel aarch64 2.3.2-5.fc40 fedora 335.4 KiB onnx-devel aarch64 1.17.0-20240404.0.git4128a090.fc40 copr_base 1.0 MiB onnx-optimizer-devel aarch64 0.3.19-20240303.0.gitb3a46118.fc40 copr_base 193.4 KiB openblas-devel aarch64 0.3.26-4.fc40 fedora 1.6 MiB openblas-openmp aarch64 0.3.26-4.fc40 fedora 19.5 MiB opencv-devel aarch64 4.9.0-20231227.1.cu12_3.fc40 copr_base 10.8 MiB peachpy-python3 noarch 0-20221113.1.git349e8f83.fc39 copr_base 13.2 MiB protobuf-compat-compiler aarch64 3.21.9-2.fc39 copr_base 3.1 MiB protobuf-compat-devel aarch64 3.21.9-2.fc39 copr_base 2.7 MiB psimd-devel noarch 1:0-20200517.2.git072586a7.fc40 copr_base 45.6 KiB pthreadpool-devel aarch64 1:0.1-20240121.0.git178e3e06.fc40 copr_base 100.5 KiB pybind11-devel aarch64 2.11.1-3.fc40 fedora 849.0 KiB python3-devel aarch64 3.12.2-2.fc40 fedora 1.2 MiB python3-numpy aarch64 1:1.26.4-1.fc40 fedora 41.6 MiB python3-pybind11 aarch64 2.11.1-3.fc40 fedora 849.6 KiB python3-pyyaml aarch64 6.0.1-14.fc40 fedora 858.3 KiB python3-setuptools noarch 69.0.3-3.fc40 fedora 7.1 MiB python3-six noarch 1.16.0-14.fc40 fedora 117.7 KiB python3-typing-extensions noarch 4.9.0-3.fc40 fedora 391.0 KiB qnnpack-devel aarch64 0-20190828.2.git7d2a4e99.fc38 copr_base 17.9 KiB rdma-core-devel aarch64 48.0-4.fc40 fedora 610.5 KiB rocksdb-devel aarch64 8.10.0-3.fc40 fedora 1.4 MiB sleef-devel aarch64 3.6-20240320.0.git60e76d2b.fc40 copr_base 192.2 KiB snappy-devel aarch64 1.1.10-4.fc40 fedora 45.2 KiB tbb-devel aarch64 2021.11.0-5.fc40 fedora 1.3 MiB tensorpipe-devel aarch64 0-20220513.1.gitbb1473a4.fc37 copr_base 489.8 KiB zeromq-devel aarch64 4.3.5-16.fc40 fedora 30.5 KiB Installing dependencies: MUMPS aarch64 5.6.2-3.fc40 fedora 8.4 MiB MUMPS-common noarch 5.6.2-3.fc40 fedora 948.0 KiB SuperLU aarch64 6.0.1-3.fc40 fedora 522.3 KiB abattis-cantarell-vf-fonts noarch 0.301-12.fc40 fedora 192.7 KiB adobe-mappings-cmap noarch 20230622-3.fc40 fedora 14.4 MiB adobe-mappings-cmap-deprecated noarch 20230622-3.fc40 fedora 582.1 KiB adobe-mappings-pdf noarch 20190401-7.fc40 fedora 4.4 MiB alsa-lib aarch64 1.2.11-2.fc40 fedora 1.8 MiB annobin-docs noarch 12.42-1.fc40 fedora 95.6 KiB annobin-plugin-gcc aarch64 12.42-1.fc40 fedora 1.1 MiB armadillo aarch64 12.8.1-1.fc40 fedora 210.3 KiB arpack aarch64 3.9.1-3.fc40 fedora 809.9 KiB asl aarch64 20240106-1.20240201git2f5d9de.fc40 fedora 2.5 MiB asmjit aarch64 1:0-20220702.1.gitc5984762.fc40 copr_base 461.2 KiB avahi-libs aarch64 0.8-26.fc40 fedora 614.2 KiB blosc aarch64 1.21.5-4.fc40 fedora 257.7 KiB cairo aarch64 1.18.0-3.fc40 fedora 2.0 MiB cairo-gobject aarch64 1.18.0-3.fc40 fedora 195.2 KiB cdparanoia-libs aarch64 10.2-44.fc40 fedora 393.6 KiB ceres-solver aarch64 2.2.0-4.fc40 fedora 5.2 MiB cfitsio aarch64 4.4.0-2.fc40 fedora 1.8 MiB cgnslib-libs aarch64 4.4.0-4.fc40 fedora 918.2 KiB cjson aarch64 1.7.15-4.fc40 fedora 223.7 KiB cliquer-libs aarch64 1.22-8.fc40 fedora 215.6 KiB cmake aarch64 3.28.2-1.fc40 fedora 28.6 MiB cmake-data noarch 3.28.2-1.fc40 fedora 8.0 MiB cmake-filesystem aarch64 3.28.2-1.fc40 fedora 0.0 B cmake-rpm-macros noarch 3.28.2-1.fc40 fedora 7.4 KiB codec2 aarch64 1.2.0-4.fc40 fedora 1.4 MiB coin-or-Cbc aarch64 2.10.11-2.fc40 fedora 2.6 MiB coin-or-Cgl aarch64 0.60.8-1.fc40 fedora 994.8 KiB coin-or-Clp aarch64 1.17.9-1.fc40 fedora 2.7 MiB coin-or-CoinUtils aarch64 2.11.10-1.fc40 fedora 1.2 MiB coin-or-Osi aarch64 0.108.9-2.fc40 fedora 5.6 MiB cpp aarch64 14.0.1-0.13.fc40 fedora 31.8 MiB cpuinfo aarch64 1:0-20240327.0.gitf42f5eaf.fc40 copr_base 793.8 KiB crypto-policies-scripts noarch 20240201-2.git9f501f3.fc40 fedora 313.8 KiB cuda-cccl-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 13.8 MiB cuda-crt-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 1.0 MiB cuda-cudart-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 735.3 KiB cuda-gcc-12 aarch64 12.3.1-1.fc39 copr_base 100.5 MiB cuda-nvrtc-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 60.4 MiB cuda-nvvm-12-3 aarch64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 58.1 MiB cuda-toolkit-12-3-config-common noarch 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 0.0 B cuda-toolkit-12-config-common noarch 12.4.127-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 44.0 B cuda-toolkit-config-common noarch 12.4.127-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 41.0 B cups-libs aarch64 1:2.4.7-11.fc40 fedora 923.0 KiB cutlass aarch64 3.4.1-20240215.0.cu12_3.fc40 copr_base 1.0 GiB dbus aarch64 1:1.14.10-3.fc40 fedora 0.0 B dbus-broker aarch64 35-4.fc40 fedora 614.1 KiB dbus-common noarch 1:1.14.10-3.fc40 fedora 11.2 KiB dbus-libs aarch64 1:1.14.10-3.fc40 fedora 489.0 KiB default-fonts-core-sans noarch 4.0-12.fc40 fedora 11.9 KiB double-conversion aarch64 3.3.0-3.fc40 fedora 204.4 KiB duktape aarch64 2.7.0-7.fc40 fedora 928.1 KiB emacs-filesystem noarch 1:29.2-3.fc40 fedora 0.0 B expat aarch64 2.6.0-1.fc40 fedora 532.8 KiB fdk-aac-free aarch64 2.0.0-13.fc40 fedora 655.3 KiB fftw aarch64 3.3.10-11.fc40 fedora 605.5 KiB fftw-libs aarch64 3.3.10-11.fc40 fedora 0.0 B fftw-libs-double aarch64 3.3.10-11.fc40 fedora 2.3 MiB fftw-libs-long aarch64 3.3.10-11.fc40 fedora 2.7 MiB fftw-libs-single aarch64 3.3.10-11.fc40 fedora 2.4 MiB flatbuffers aarch64 23.5.26-6.fc40 fedora 598.3 KiB flexiblas aarch64 3.4.2-1.fc40 fedora 46.9 KiB flexiblas-netlib aarch64 3.4.2-1.fc40 fedora 9.6 MiB flexiblas-netlib64 aarch64 3.4.2-1.fc40 fedora 9.5 MiB flexiblas-openblas-openmp aarch64 3.4.2-1.fc40 fedora 195.3 KiB flexiblas-openblas-openmp64 aarch64 3.4.2-1.fc40 fedora 195.3 KiB fontconfig aarch64 2.15.0-4.fc40 fedora 2.4 MiB fonts-filesystem noarch 1:2.0.5-14.fc40 fedora 0.0 B foxi aarch64 0-20210526.1.gitc278588e.fc37 copr_base 68.8 KiB fp16 aarch64 1:0-20240410.0.git581ac1c7.fc40 copr_base 198.3 KiB freetype aarch64 2.13.2-5.fc40 fedora 942.9 KiB freexl aarch64 2.0.0-7.fc40 fedora 221.4 KiB fribidi aarch64 1.0.13-4.fc40 fedora 673.1 KiB game-music-emu aarch64 0.6.3-14.fc40 fedora 362.4 KiB gc aarch64 8.2.2-6.fc40 fedora 850.3 KiB gcc aarch64 14.0.1-0.13.fc40 fedora 93.3 MiB gcc-plugin-annobin aarch64 14.0.1-0.13.fc40 fedora 197.0 KiB gd aarch64 2.3.3-16.fc40 fedora 515.6 KiB gdal-libs aarch64 3.8.4-2.fc40 fedora 25.9 MiB gdk-pixbuf2 aarch64 2.42.10-8.fc40 fedora 2.9 MiB gdk-pixbuf2-modules aarch64 2.42.10-8.fc40 fedora 2.1 MiB geos aarch64 3.12.1-3.fc40 fedora 3.8 MiB gflags aarch64 2.2.2-14.fc40 fedora 556.3 KiB giflib aarch64 5.2.2-1.fc40 fedora 260.2 KiB git-core aarch64 2.44.0-1.fc40 fedora 21.8 MiB git-core-doc noarch 2.44.0-1.fc40 fedora 16.8 MiB gklib aarch64 5.1.1-20230326.0.git8bd6bad7.fc39 copr_base 334.0 KiB gl-manpages noarch 1.1-31.20190306.fc40 fedora 935.5 KiB glib2 aarch64 2.80.0-1.fc40 fedora 16.4 MiB glibc-devel aarch64 2.39.9999-99.fc40 copr_base 2.2 MiB glog aarch64 0.3.5-20.fc40 fedora 267.4 KiB gloo aarch64 1:0.5.0-20240411.0.git6c70a556.cu12_3.fc40 copr_base 3.8 MiB glpk aarch64 5.0-11.fc40 fedora 878.7 KiB glx-utils aarch64 9.0.0-6.fc40 fedora 846.9 KiB gmp-c++ aarch64 1:6.2.1-8.fc40 fedora 195.5 KiB gnupg2 aarch64 2.4.4-1.fc40 fedora 12.3 MiB gnutls aarch64 3.8.3-2.fc40 fedora 3.4 MiB google-droid-sans-fonts noarch 20200215-19.fc40 fedora 6.3 MiB google-noto-fonts-common noarch 20240301-2.fc40 fedora 17.5 KiB google-noto-sans-vf-fonts noarch 20240301-2.fc40 fedora 1.2 MiB gpgme aarch64 1.23.2-3.fc40 fedora 810.8 KiB gpgmepp aarch64 1.23.2-3.fc40 fedora 521.8 KiB graphene aarch64 1.10.6-8.fc40 fedora 242.6 KiB graphite2 aarch64 1.3.14-15.fc40 fedora 495.7 KiB graphviz aarch64 9.0.0-11.fc40 fedora 27.6 MiB groff-base aarch64 1.23.0-6.fc40 fedora 5.4 MiB gsm aarch64 1.0.22-6.fc40 fedora 204.7 KiB gstreamer1 aarch64 1.22.9-1.fc40 fedora 6.7 MiB gstreamer1-plugins-base aarch64 1.22.9-1.fc40 fedora 12.5 MiB gts aarch64 0.7.6-48.20121130.fc40 fedora 2.4 MiB guile30 aarch64 3.0.7-12.fc40 fedora 52.0 MiB halide aarch64 17.0.1-20240220.0.fc40 copr_base 133.3 MiB harfbuzz aarch64 8.3.0-5.fc40 fedora 2.9 MiB hdf-libs aarch64 4.2.16.2-1.fc40 fedora 851.0 KiB hdf5 aarch64 1.12.1-15.fc40 fedora 12.4 MiB highway aarch64 1.1.0-1.fc40 fedora 793.0 KiB hiredis aarch64 1.0.2-7.fc40 fedora 198.3 KiB hwloc-libs aarch64 2.10.0-3.fc40 fedora 2.9 MiB ilbc aarch64 3.0.4-10.fc40 fedora 207.4 KiB imath aarch64 3.1.10-1.fc40 fedora 508.8 KiB infiniband-diags aarch64 48.0-4.fc40 fedora 4.3 MiB isl aarch64 0.16.1-20.fc40 fedora 3.4 MiB iso-codes noarch 4.16.0-3.fc40 fedora 18.8 MiB jbig2dec-libs aarch64 0.20-4.fc40 fedora 301.0 KiB jbigkit-libs aarch64 2.1-29.fc40 fedora 437.5 KiB json-c aarch64 0.17-3.fc40 fedora 202.3 KiB jsoncpp aarch64 1.9.5-7.fc40 fedora 335.6 KiB kernel-headers aarch64 6.8.3-300.fc40 fedora 6.1 MiB keyutils-libs-devel aarch64 1.6.3-3.fc40 fedora 48.2 KiB kineto aarch64 0.4.0-20240327.0.git445909a8.cu12_3.fc40 copr_base 787.1 KiB kmod-libs aarch64 31-5.fc40 fedora 287.1 KiB krb5-devel aarch64 1.21.2-5.fc40 fedora 706.6 KiB lame-libs aarch64 3.100-17.fc40 fedora 1.3 MiB lasi aarch64 1.1.3-13.fc40 fedora 258.4 KiB lcms2 aarch64 2.16-3.fc40 fedora 484.8 KiB less aarch64 643-4.fc40 fedora 800.3 KiB leveldb aarch64 1.23-9.fc40 fedora 406.8 KiB libGLEW aarch64 2.2.0-7.fc40 fedora 840.4 KiB libICE aarch64 1.1.1-3.fc40 fedora 273.0 KiB libSM aarch64 1.2.4-3.fc40 fedora 253.3 KiB libX11 aarch64 1.8.7-3.fc40 fedora 1.3 MiB libX11-common noarch 1.8.7-3.fc40 fedora 1.1 MiB libX11-devel aarch64 1.8.7-3.fc40 fedora 1.0 MiB libX11-xcb aarch64 1.8.7-3.fc40 fedora 195.0 KiB libXau aarch64 1.0.11-6.fc40 fedora 242.8 KiB libXau-devel aarch64 1.0.11-6.fc40 fedora 6.4 KiB libXcursor aarch64 1.2.1-7.fc40 fedora 197.4 KiB libXext aarch64 1.3.6-1.fc40 fedora 209.9 KiB libXfixes aarch64 6.0.1-3.fc40 fedora 198.3 KiB libXft aarch64 2.3.8-6.fc40 fedora 256.4 KiB libXi aarch64 1.8.1-5.fc40 fedora 200.6 KiB libXpm aarch64 3.5.17-3.fc40 fedora 264.4 KiB libXrender aarch64 0.9.11-6.fc40 fedora 198.1 KiB libXt aarch64 1.3.0-3.fc40 fedora 605.5 KiB libXv aarch64 1.0.12-3.fc40 fedora 198.0 KiB libXxf86vm aarch64 1.1.5-6.fc40 fedora 197.3 KiB libaec aarch64 1.1.2-1.fc40 fedora 410.0 KiB libaom aarch64 3.8.2-1.fc40 fedora 3.7 MiB libarrow aarch64 15.0.2-3.fc40 fedora 19.5 MiB libarrow-doc noarch 15.0.2-3.fc40 fedora 115.4 KiB libasan aarch64 14.0.1-0.13.fc40 fedora 1.6 MiB libassuan aarch64 2.5.7-1.fc40 fedora 279.7 KiB libatomic aarch64 14.0.1-0.13.fc40 fedora 196.9 KiB libavcodec-free aarch64 6.1.1-8.fc40 fedora 9.6 MiB libavformat-free aarch64 6.1.1-8.fc40 fedora 2.7 MiB libavif aarch64 1.0.4-1.fc40 fedora 279.8 KiB libavutil-free aarch64 6.1.1-8.fc40 fedora 933.8 KiB libb2 aarch64 0.98.1-11.fc40 fedora 202.1 KiB libbluray aarch64 1.3.4-5.fc40 fedora 493.8 KiB libcbor aarch64 0.11.0-1.fc40 fedora 201.9 KiB libchromaprint aarch64 1.5.1-17.fc40 fedora 208.5 KiB libcom_err-devel aarch64 1.47.0-5.fc40 fedora 16.7 KiB libcublas-12-3 aarch64 12.3.4.1-2 copr_rezso_CUDA 584.0 MiB libcudnn8 aarch64 8.9.7.29-2.cuda12.3 copr_rezso_CUDA 1.0 GiB libcufft-12-3 aarch64 11.0.12.1-2 copr_rezso_CUDA 169.7 MiB libcurand-12-3 aarch64 10.3.4.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 91.7 MiB libcusolver-12-3 aarch64 11.5.4.101-2 copr_rezso_CUDA 185.5 MiB libcusparse-12-3 aarch64 12.2.0.103-2 copr_rezso_CUDA 251.8 MiB libdatrie aarch64 0.2.13-9.fc40 fedora 221.9 KiB libdav1d aarch64 1.4.0-1.fc40 fedora 920.1 KiB libdc1394 aarch64 2.2.7-5.fc40 fedora 442.9 KiB libdeflate aarch64 1.20-1.fc40 fedora 224.6 KiB libdicom aarch64 1.0.5-3.fc40 fedora 518.1 KiB libdrm aarch64 2.4.120-3.fc40 fedora 1.4 MiB libedit aarch64 3.1-50.20230828cvs.fc40 fedora 343.8 KiB libevdev aarch64 1.13.1-4.fc40 fedora 198.1 KiB libfido2 aarch64 1.14.0-4.fc40 fedora 341.9 KiB libgcrypt aarch64 1.10.3-3.fc40 fedora 1.1 MiB libgeotiff aarch64 1.7.1-12.fc40 fedora 1.1 MiB libgfortran aarch64 14.0.1-0.13.fc40 fedora 1.5 MiB libglvnd aarch64 1:1.7.0-4.fc40 fedora 1.7 MiB libglvnd-core-devel aarch64 1:1.7.0-4.fc40 fedora 40.3 KiB libglvnd-devel aarch64 1:1.7.0-4.fc40 fedora 2.1 MiB libglvnd-egl aarch64 1:1.7.0-4.fc40 fedora 196.8 KiB libglvnd-gles aarch64 1:1.7.0-4.fc40 fedora 650.0 KiB libglvnd-glx aarch64 1:1.7.0-4.fc40 fedora 1.3 MiB libglvnd-opengl aarch64 1:1.7.0-4.fc40 fedora 521.0 KiB libgpg-error aarch64 1.48-1.fc40 fedora 1.1 MiB libgs aarch64 10.02.1-8.fc40 fedora 23.6 MiB libgta aarch64 1.2.1-12.fc40 fedora 222.1 KiB libgudev aarch64 238-5.fc40 fedora 231.8 KiB libharu aarch64 2.4.3-5.fc40 fedora 1.8 MiB libibumad aarch64 48.0-4.fc40 fedora 195.9 KiB libibverbs aarch64 48.0-4.fc40 fedora 3.9 MiB libicu aarch64 74.2-1.fc40 fedora 35.9 MiB libijs aarch64 0.35-22.fc40 fedora 229.6 KiB libimagequant aarch64 4.0.3-3.fc40 fedora 730.5 KiB libinput aarch64 1.25.0-3.fc40 fedora 1.7 MiB libjpeg-turbo aarch64 3.0.2-1.fc40 fedora 792.4 KiB libjxl aarch64 1:0.8.2-6.fc40 fedora 2.1 MiB libkadm5 aarch64 1.21.2-5.fc40 fedora 458.1 KiB libkml aarch64 1.3.0-47.fc40 fedora 1.9 MiB libksba aarch64 1.6.6-1.fc40 fedora 524.8 KiB libldb aarch64 2.9.0-1.fc40 fedora 3.0 MiB liblerc aarch64 4.0.0-6.fc40 fedora 610.4 KiB libmodplug aarch64 1:0.8.9.0-19.fc40 fedora 411.1 KiB libmpc aarch64 1.3.1-5.fc40 fedora 280.7 KiB libnauty aarch64 2.8.8-3.fc40 fedora 5.1 MiB libnccl aarch64 2.21.5-1+cuda12.4 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 230.3 MiB libnl3 aarch64 3.9.0-3.fc40 fedora 1.7 MiB libnpp-12-3 aarch64 12.2.3.2-2 copr_rezso_CUDA 234.8 MiB libnvjitlink-12-3 aarch64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 46.0 MiB libogg aarch64 2:1.3.5-8.fc40 fedora 205.4 KiB libopenmpt aarch64 0.7.3-3.fc40 fedora 1.6 MiB liborc1 aarch64 1.9.3-1.fc40 fedora 1.7 MiB libpaper aarch64 1:2.1.1-3.fc40 fedora 224.8 KiB libpng aarch64 2:1.6.40-3.fc40 fedora 333.6 KiB libpq aarch64 16.1-4.fc40 fedora 1.0 MiB libproxy aarch64 0.5.3-5.fc40 fedora 431.1 KiB libqhull_r aarch64 1:8.0.2-4.fc40 fedora 583.4 KiB librabbitmq aarch64 0.13.0-5.fc40 fedora 197.7 KiB libraw1394 aarch64 2.1.2-20.fc40 fedora 822.9 KiB librdmacm aarch64 48.0-4.fc40 fedora 462.1 KiB librist aarch64 0.2.7-4.fc40 fedora 269.4 KiB librsvg2 aarch64 2.57.1-4.fc40 fedora 4.4 MiB librttopo aarch64 1.1.0-14.fc40 fedora 544.8 KiB libseccomp aarch64 2.5.3-8.fc40 fedora 243.2 KiB libselinux-devel aarch64 3.6-4.fc40 fedora 126.1 KiB libsepol-devel aarch64 3.6-3.fc40 fedora 120.2 KiB libsmbclient aarch64 2:4.20.0-0.5.rc4.fc40 fedora 174.5 KiB libsodium aarch64 1.0.19-4.fc40 fedora 392.1 KiB libsodium-devel aarch64 1.0.19-4.fc40 fedora 3.8 MiB libspatialite aarch64 5.1.0-5.fc40 fedora 15.7 MiB libstdc++-devel aarch64 14.0.1-0.13.fc40 fedora 15.1 MiB libswresample-free aarch64 6.1.1-8.fc40 fedora 218.6 KiB libswscale-free aarch64 6.1.1-8.fc40 fedora 480.5 KiB libtalloc aarch64 2.4.2-1.fc40 fedora 196.4 KiB libtdb aarch64 1.4.10-1.fc40 fedora 197.1 KiB libtevent aarch64 0.16.1-1.fc40 fedora 197.9 KiB libthai aarch64 0.1.29-8.fc40 fedora 935.4 KiB libtheora aarch64 1:1.1.1-36.fc40 fedora 852.9 KiB libtiff aarch64 4.6.0-2.fc40 fedora 1.7 MiB libubsan aarch64 14.0.1-0.13.fc40 fedora 539.3 KiB libudfread aarch64 1.1.2-8.fc40 fedora 222.0 KiB libunwind aarch64 1.8.0-3.fc40 fedora 605.6 KiB libunwind-devel aarch64 1.8.0-3.fc40 fedora 145.4 KiB liburing aarch64 2.5-3.fc40 fedora 419.0 KiB libusb1 aarch64 1.0.27-1.fc40 fedora 242.2 KiB libuv aarch64 1:1.48.0-1.fc40 fedora 650.5 KiB libuv-static aarch64 1:1.48.0-1.fc40 fedora 419.7 KiB libva aarch64 2.21.0-3.fc40 fedora 1.1 MiB libvdpau aarch64 1.5-6.fc40 fedora 196.8 KiB libverto-devel aarch64 0.3.2-8.fc40 fedora 25.7 KiB libvisual aarch64 1:0.4.1-4.fc40 fedora 551.3 KiB libvorbis aarch64 1:1.3.7-10.fc40 fedora 1.3 MiB libvpx aarch64 1.14.0-1.fc40 fedora 2.6 MiB libwacom aarch64 2.10.0-1.fc40 fedora 406.4 KiB libwacom-data noarch 2.10.0-1.fc40 fedora 613.0 KiB libwayland-client aarch64 1.22.0-3.fc40 fedora 198.1 KiB libwayland-cursor aarch64 1.22.0-3.fc40 fedora 209.0 KiB libwayland-egl aarch64 1.22.0-3.fc40 fedora 196.5 KiB libwayland-server aarch64 1.22.0-3.fc40 fedora 198.5 KiB libwbclient aarch64 2:4.20.0-0.5.rc4.fc40 fedora 75.0 KiB libwebp aarch64 1.3.2-5.fc40 fedora 1.2 MiB libxcb aarch64 1.16-4.fc40 fedora 5.0 MiB libxcb-devel aarch64 1.16-4.fc40 fedora 2.7 MiB libxcrypt-devel aarch64 4.4.36-5.fc40 fedora 30.3 KiB libxkbcommon aarch64 1.6.0-2.fc40 fedora 596.3 KiB libxkbcommon-x11 aarch64 1.6.0-2.fc40 fedora 195.6 KiB libxshmfence aarch64 1.3.2-3.fc40 fedora 195.3 KiB libyaml aarch64 0.2.5-14.fc40 fedora 262.5 KiB llvm17-libs aarch64 17.0.6-7.fc40 fedora 110.6 MiB lmdb aarch64 0.9.32-1.fc40 fedora 786.3 KiB lmdb-libs aarch64 0.9.32-1.fc40 fedora 209.2 KiB lpcnetfreedv aarch64 0.5-5.fc40 fedora 14.9 MiB magma aarch64 2.8.0-20240328.0.cu12_3.fc40 copr_base 230.9 MiB make aarch64 1:4.4.1-6.fc40 fedora 1.8 MiB mariadb-connector-c aarch64 3.3.8-3.fc40 fedora 2.0 MiB mariadb-connector-c-config noarch 3.3.8-3.fc40 fedora 497.0 B mbedtls aarch64 2.28.7-1.fc40 fedora 1.4 MiB mesa-filesystem aarch64 24.0.4-1.fc40 fedora 3.6 KiB mesa-libEGL aarch64 24.0.4-1.fc40 fedora 395.6 KiB mesa-libGL aarch64 24.0.4-1.fc40 fedora 725.7 KiB mesa-libGLU aarch64 9.0.3-4.fc40 fedora 393.5 KiB mesa-libgbm aarch64 24.0.4-1.fc40 fedora 197.3 KiB mesa-libglapi aarch64 24.0.4-1.fc40 fedora 460.8 KiB metis aarch64 5.2.1-20230403.0.gite0f1b88b.fc39 copr_base 1.5 MiB miniz aarch64 3.0.2-5.fc40 fedora 220.0 KiB minizip-ng-compat aarch64 3.0.10-7.fc40 fedora 262.6 KiB mpdecimal aarch64 2.5.1-9.fc40 fedora 328.7 KiB mpg123-libs aarch64 1.31.3-4.fc40 fedora 1.0 MiB mtdev aarch64 1.1.6-8.fc40 fedora 197.3 KiB ncurses aarch64 6.4-12.20240127.fc40 fedora 1.7 MiB netcdf aarch64 4.9.2-5.fc40 fedora 4.7 MiB netpbm aarch64 11.02.00-6.fc40 fedora 629.1 KiB nettle aarch64 3.9.1-6.fc40 fedora 953.6 KiB nnpack aarch64 0-20230201.0.git70a77f48.fc38 copr_base 271.8 KiB npth aarch64 1.7-1.fc40 fedora 221.5 KiB nspr aarch64 4.35.0-21.fc40 fedora 740.6 KiB nss aarch64 3.98.0-1.fc40 fedora 2.2 MiB nss-softokn aarch64 3.98.0-1.fc40 fedora 2.6 MiB nss-softokn-freebl aarch64 3.98.0-1.fc40 fedora 995.6 KiB nss-sysinit aarch64 3.98.0-1.fc40 fedora 198.3 KiB nss-util aarch64 3.98.0-1.fc40 fedora 346.2 KiB numactl-libs aarch64 2.0.16-5.fc40 fedora 197.0 KiB ocl-icd aarch64 2.3.2-5.fc40 fedora 282.8 KiB ogdi aarch64 4.1.1-1.fc40 fedora 2.1 MiB onnx-libs aarch64 1.17.0-20240404.0.git4128a090.fc40 copr_base 3.0 MiB onnx-optimizer aarch64 0.3.19-20240303.0.gitb3a46118.fc40 copr_base 1.0 MiB openblas aarch64 0.3.26-4.fc40 fedora 96.0 KiB openblas-openmp64 aarch64 0.3.26-4.fc40 fedora 19.3 MiB openblas-openmp64_ aarch64 0.3.26-4.fc40 fedora 19.3 MiB openblas-serial aarch64 0.3.26-4.fc40 fedora 18.4 MiB openblas-serial64 aarch64 0.3.26-4.fc40 fedora 18.3 MiB openblas-serial64_ aarch64 0.3.26-4.fc40 fedora 18.3 MiB openblas-threads aarch64 0.3.26-4.fc40 fedora 19.5 MiB openblas-threads64 aarch64 0.3.26-4.fc40 fedora 19.3 MiB openblas-threads64_ aarch64 0.3.26-4.fc40 fedora 19.3 MiB opencl-headers noarch 3.0-21.20231212git2368105.fc40 fedora 722.6 KiB opencore-amr aarch64 0.1.6-6.fc40 fedora 549.2 KiB opencv aarch64 4.9.0-20231227.1.cu12_3.fc40 copr_base 21.9 MiB opencv-contrib aarch64 4.9.0-20231227.1.cu12_3.fc40 copr_base 17.8 MiB opencv-core aarch64 4.9.0-20231227.1.cu12_3.fc40 copr_base 50.0 MiB opencv-cuda aarch64 4.9.0-20231227.1.cu12_3.fc40 copr_base 573.5 MiB opencv-static aarch64 4.9.0-20231227.1.cu12_3.fc40 copr_base 2.5 MiB openexr-libs aarch64 3.1.10-5.fc40 fedora 6.9 MiB openjpeg2 aarch64 2.5.2-1.fc40 fedora 537.6 KiB openpgm aarch64 5.2.122-34.fc40 fedora 416.3 KiB openpgm-devel aarch64 5.2.122-34.fc40 fedora 339.7 KiB openslide aarch64 4.0.0-3.fc40 fedora 430.7 KiB openssh aarch64 9.6p1-1.fc40.2 fedora 2.0 MiB openssh-clients aarch64 9.6p1-1.fc40.2 fedora 3.5 MiB opus aarch64 1.5.1-1.fc40 fedora 519.8 KiB orc aarch64 0.4.38-2.fc40 fedora 1.2 MiB pango aarch64 1.51.2-1.fc40 fedora 1.9 MiB pcre aarch64 8.45-1.fc40.6 fedora 745.7 KiB pcre2-devel aarch64 10.42-2.fc40.2 fedora 1.9 MiB pcre2-utf16 aarch64 10.42-2.fc40.2 fedora 646.1 KiB pcre2-utf32 aarch64 10.42-2.fc40.2 fedora 582.0 KiB perl-AutoLoader noarch 5.74-506.fc40 fedora 20.5 KiB perl-B aarch64 1.88-506.fc40 fedora 604.3 KiB perl-Carp noarch 1.54-502.fc40 fedora 46.5 KiB perl-Class-Struct noarch 0.68-506.fc40 fedora 25.4 KiB perl-Data-Dumper aarch64 2.188-503.fc40 fedora 263.6 KiB perl-Digest noarch 1.20-502.fc40 fedora 35.2 KiB perl-Digest-MD5 aarch64 2.59-3.fc40 fedora 231.7 KiB perl-DynaLoader aarch64 1.54-506.fc40 fedora 32.1 KiB perl-Encode aarch64 4:3.21-505.fc40 fedora 10.9 MiB perl-Errno aarch64 1.37-506.fc40 fedora 8.4 KiB perl-Error noarch 1:0.17029-15.fc40 fedora 77.2 KiB perl-Exporter noarch 5.78-3.fc40 fedora 54.2 KiB perl-Fcntl aarch64 1.15-506.fc40 fedora 200.6 KiB perl-File-Basename noarch 2.86-506.fc40 fedora 14.0 KiB perl-File-Find noarch 1.43-506.fc40 fedora 41.9 KiB perl-File-Path noarch 2.18-503.fc40 fedora 63.5 KiB perl-File-Temp noarch 1:0.231.100-503.fc40 fedora 162.3 KiB perl-File-stat noarch 1.13-506.fc40 fedora 12.7 KiB perl-FileHandle noarch 2.05-506.fc40 fedora 9.3 KiB perl-Getopt-Long noarch 1:2.57-3.fc40 fedora 144.1 KiB perl-Getopt-Std noarch 1.13-506.fc40 fedora 11.1 KiB perl-Git noarch 2.44.0-1.fc40 fedora 64.0 KiB perl-HTTP-Tiny noarch 0.088-5.fc40 fedora 152.1 KiB perl-IO aarch64 1.52-506.fc40 fedora 319.0 KiB perl-IO-Socket-IP noarch 0.42-2.fc40 fedora 98.6 KiB perl-IO-Socket-SSL noarch 2.085-1.fc40 fedora 685.0 KiB perl-IPC-Open3 noarch 1.22-506.fc40 fedora 22.4 KiB perl-MIME-Base64 aarch64 3.16-503.fc40 fedora 222.0 KiB perl-Mozilla-CA noarch 20231213-3.fc40 fedora 9.1 KiB perl-Net-SSLeay aarch64 1.94-3.fc40 fedora 1.4 MiB perl-POSIX aarch64 2.13-506.fc40 fedora 325.0 KiB perl-PathTools aarch64 3.89-502.fc40 fedora 351.6 KiB perl-Pod-Escapes noarch 1:1.07-503.fc40 fedora 24.9 KiB perl-Pod-Perldoc noarch 3.28.01-503.fc40 fedora 163.1 KiB perl-Pod-Simple noarch 1:3.45-6.fc40 fedora 559.8 KiB perl-Pod-Usage noarch 4:2.03-503.fc40 fedora 84.7 KiB perl-Scalar-List-Utils aarch64 5:1.63-503.fc40 fedora 277.4 KiB perl-SelectSaver noarch 1.02-506.fc40 fedora 2.2 KiB perl-Socket aarch64 4:2.037-5.fc40 fedora 271.6 KiB perl-Storable aarch64 1:3.32-502.fc40 fedora 372.3 KiB perl-Symbol noarch 1.09-506.fc40 fedora 6.8 KiB perl-Term-ANSIColor noarch 5.01-504.fc40 fedora 97.5 KiB perl-Term-Cap noarch 1.18-503.fc40 fedora 29.3 KiB perl-TermReadKey aarch64 2.38-21.fc40 fedora 236.0 KiB perl-Text-ParseWords noarch 3.31-502.fc40 fedora 13.5 KiB perl-Text-Tabs+Wrap noarch 2024.001-1.fc40 fedora 22.5 KiB perl-Time-Local noarch 2:1.350-5.fc40 fedora 68.9 KiB perl-URI noarch 5.27-1.fc40 fedora 239.8 KiB perl-base noarch 2.27-506.fc40 fedora 12.5 KiB perl-constant noarch 1.33-503.fc40 fedora 26.2 KiB perl-if noarch 0.61.000-506.fc40 fedora 5.8 KiB perl-interpreter aarch64 4:5.38.2-506.fc40 fedora 299.7 KiB perl-lib aarch64 0.65-506.fc40 fedora 8.5 KiB perl-libnet noarch 3.15-503.fc40 fedora 289.0 KiB perl-libs aarch64 4:5.38.2-506.fc40 fedora 11.2 MiB perl-locale noarch 1.10-506.fc40 fedora 6.2 KiB perl-mro aarch64 1.28-506.fc40 fedora 209.6 KiB perl-overload noarch 1.37-506.fc40 fedora 71.5 KiB perl-overloading noarch 0.02-506.fc40 fedora 4.8 KiB perl-parent noarch 1:0.241-502.fc40 fedora 9.7 KiB perl-podlators noarch 1:5.01-502.fc40 fedora 308.1 KiB perl-vars noarch 1.05-506.fc40 fedora 3.9 KiB pixman aarch64 0.43.0-3.fc40 fedora 718.3 KiB poppler aarch64 24.02.0-2.fc40 fedora 3.9 MiB poppler-data noarch 0.4.11-7.fc40 fedora 12.3 MiB poppler-glib aarch64 24.02.0-2.fc40 fedora 665.8 KiB proj aarch64 9.3.1-3.fc40 fedora 5.2 MiB proj-data noarch 9.3.1-3.fc40 fedora 8.5 MiB protobuf aarch64 3.19.6-8.fc40 fedora 3.2 MiB protobuf-compat aarch64 3.21.9-2.fc39 copr_base 3.6 MiB pthreadpool aarch64 1:0.1-20240121.0.git178e3e06.fc40 copr_base 199.1 KiB pugixml aarch64 1.13-5.fc40 fedora 329.2 KiB pyproject-rpm-macros noarch 1.12.0-1.fc40 fedora 98.8 KiB python-pip-wheel noarch 23.3.2-1.fc40 fedora 1.5 MiB python-rpm-macros noarch 3.12-7.fc40 fedora 22.1 KiB python3 aarch64 3.12.2-2.fc40 fedora 211.8 KiB python3-libs aarch64 3.12.2-2.fc40 fedora 51.9 MiB python3-packaging noarch 23.2-4.fc40 fedora 421.1 KiB python3-rpm-generators noarch 14-10.fc40 fedora 81.7 KiB python3-rpm-macros noarch 3.12-7.fc40 fedora 6.4 KiB qnnpack aarch64 0-20190828.2.git7d2a4e99.fc38 copr_base 206.0 KiB qt-settings noarch 40.0-1.fc40 fedora 1.1 KiB qt5-qtbase aarch64 5.15.13-1.fc40 fedora 11.4 MiB qt5-qtbase-common noarch 5.15.13-1.fc40 fedora 78.0 B qt5-qtbase-gui aarch64 5.15.13-1.fc40 fedora 24.4 MiB rav1e-libs aarch64 0.7.1-1.fc40 fedora 2.1 MiB re2 aarch64 1:20220601-5.fc40 fedora 654.0 KiB rhash aarch64 1.4.3-4.fc40 fedora 584.6 KiB rocksdb aarch64 8.10.0-3.fc40 fedora 8.9 MiB rsvg-pixbuf-loader aarch64 2.57.1-4.fc40 fedora 195.5 KiB samba-client-libs aarch64 2:4.20.0-0.5.rc4.fc40 fedora 20.6 MiB samba-common noarch 2:4.20.0-0.5.rc4.fc40 fedora 141.1 KiB samba-common-libs aarch64 2:4.20.0-0.5.rc4.fc40 fedora 267.8 KiB scotch aarch64 7.0.4-3.fc40 fedora 1.2 MiB scotch-devel aarch64 7.0.4-3.fc40 fedora 455.7 KiB shared-mime-info aarch64 2.3-4.fc40 fedora 5.3 MiB sleef aarch64 3.6-20240320.0.git60e76d2b.fc40 copr_base 1.5 MiB snappy aarch64 1.1.10-4.fc40 fedora 211.0 KiB soxr aarch64 0.1.3-15.fc40 fedora 463.1 KiB speex aarch64 1.2.0-17.fc40 fedora 200.5 KiB srt-libs aarch64 1.5.3-2.fc40 fedora 922.9 KiB suitesparse aarch64 7.6.0-1.fc40 fedora 115.9 MiB svt-av1-libs aarch64 1.4.1-5.fc40 fedora 3.5 MiB systemd aarch64 255.4-1.fc40 fedora 26.1 MiB systemd-pam aarch64 255.4-1.fc40 fedora 1.4 MiB systemd-rpm-macros noarch 255.4-1.fc40 fedora 9.5 KiB tbb aarch64 2021.11.0-5.fc40 fedora 868.0 KiB tbb-bind aarch64 2021.11.0-5.fc40 fedora 195.6 KiB tbb2020.3 aarch64 2020.3-4.fc40 fedora 281.1 KiB tensorpipe aarch64 0-20220513.1.gitbb1473a4.fc37 copr_base 2.8 MiB tpm2-tss aarch64 4.0.1-7.fc40 fedora 3.2 MiB twolame-libs aarch64 0.4.0-4.fc40 fedora 221.6 KiB tzdata noarch 2024a-4.fc40 fedora 1.6 MiB unixODBC aarch64 2.3.12-4.fc40 fedora 2.8 MiB uriparser aarch64 0.9.7-5.fc40 fedora 484.4 KiB urw-base35-bookman-fonts noarch 20200910-19.fc40 fedora 1.4 MiB urw-base35-c059-fonts noarch 20200910-19.fc40 fedora 1.4 MiB urw-base35-d050000l-fonts noarch 20200910-19.fc40 fedora 84.3 KiB urw-base35-fonts noarch 20200910-19.fc40 fedora 5.3 KiB urw-base35-fonts-common noarch 20200910-19.fc40 fedora 37.4 KiB urw-base35-gothic-fonts noarch 20200910-19.fc40 fedora 1.2 MiB urw-base35-nimbus-mono-ps-fonts noarch 20200910-19.fc40 fedora 1.0 MiB urw-base35-nimbus-roman-fonts noarch 20200910-19.fc40 fedora 1.4 MiB urw-base35-nimbus-sans-fonts noarch 20200910-19.fc40 fedora 2.4 MiB urw-base35-p052-fonts noarch 20200910-19.fc40 fedora 1.5 MiB urw-base35-standard-symbols-ps-fonts noarch 20200910-19.fc40 fedora 44.2 KiB urw-base35-z003-fonts noarch 20200910-19.fc40 fedora 390.8 KiB utf8proc aarch64 2.7.0-7.fc40 fedora 538.4 KiB vapoursynth-libs aarch64 65-2.fc40 fedora 1.2 MiB vim-filesystem noarch 2:9.1.158-1.fc40 fedora 40.0 B vo-amrwbenc aarch64 0.1.3-20.fc40 fedora 241.8 KiB vtk aarch64 9.2.6-12.fc40 fedora 113.3 MiB xapian-core-libs aarch64 1.4.23-2.fc40 fedora 2.1 MiB xcb-util aarch64 0.4.1-5.fc40 fedora 198.4 KiB xcb-util-image aarch64 0.4.1-5.fc40 fedora 198.2 KiB xcb-util-keysyms aarch64 0.4.1-5.fc40 fedora 196.7 KiB xcb-util-renderutil aarch64 0.3.10-5.fc40 fedora 200.4 KiB xcb-util-wm aarch64 0.4.2-5.fc40 fedora 393.4 KiB xerces-c aarch64 3.2.5-2.fc40 fedora 3.6 MiB xkeyboard-config noarch 2.41-1.fc40 fedora 6.6 MiB xml-common noarch 0.6.3-63.fc40 fedora 78.4 KiB xorg-x11-proto-devel noarch 2023.2-4.fc40 fedora 1.7 MiB xvidcore aarch64 1.3.7-11.fc40 fedora 744.5 KiB zeromq aarch64 4.3.5-16.fc40 fedora 1.2 MiB zimg aarch64 3.0.5-2.fc40 fedora 471.3 KiB zlib-ng-compat-devel aarch64 2.1.6-2.fc40 fedora 103.4 KiB zvbi aarch64 0.2.35-22.fc40 fedora 1.9 MiB Transaction Summary: Installing: 574 packages Total size of inbound packages is 2 GiB. Need to download 2 GiB. After this operation 8 GiB will be used (install 8 GiB, remove 0 B). [ 1/574] doxygen-2:1.10.0-3.fc40.aarch 100% | 140.1 MiB/s | 5.3 MiB | 00m00s [ 2/574] cutlass-devel-0:3.4.1-2024021 100% | 16.8 MiB/s | 774.1 KiB | 00m00s [ 3/574] fftw-devel-0:3.3.10-11.fc40.a 100% | 26.1 MiB/s | 133.5 KiB | 00m00s [ 4/574] eigen3-devel-0:3.4.0-15.fc40. 100% | 53.6 MiB/s | 1.2 MiB | 00m00s [ 5/574] foxi-devel-0:0-20210526.1.git 100% | 1.7 MiB/s | 24.5 KiB | 00m00s [ 6/574] fp16-devel-1:0-20240410.0.git 100% | 1.1 MiB/s | 12.6 KiB | 00m00s [ 7/574] flatbuffers-devel-0:23.5.26-6 100% | 2.2 MiB/s | 110.9 KiB | 00m00s [ 8/574] gemmlowp-devel-0:0-20231104.0 100% | 8.5 MiB/s | 157.2 KiB | 00m00s [ 9/574] flatbuffers-compiler-0:23.5.2 100% | 6.8 MiB/s | 940.8 KiB | 00m00s [ 10/574] git-0:2.44.0-1.fc40.aarch64 100% | 5.8 MiB/s | 53.3 KiB | 00m00s [ 11/574] glog-devel-0:0.3.5-20.fc40.aa 100% | 1.1 MiB/s | 37.7 KiB | 00m00s [ 12/574] gflags-devel-0:2.2.2-14.fc40. 100% | 373.0 KiB/s | 24.6 KiB | 00m00s [ 13/574] hiredis-devel-0:1.0.2-7.fc40. 100% | 2.3 MiB/s | 37.2 KiB | 00m00s [ 14/574] gmp-devel-1:6.2.1-8.fc40.aarc 100% | 7.4 MiB/s | 173.9 KiB | 00m00s [ 15/574] kineto-devel-0:0.4.0-20240327 100% | 2.0 MiB/s | 23.0 KiB | 00m00s [ 16/574] libuv-devel-1:1.48.0-1.fc40.a 100% | 5.1 MiB/s | 41.9 KiB | 00m00s [ 17/574] libzstd-devel-0:1.5.5-5.fc40. 100% | 16.5 MiB/s | 50.7 KiB | 00m00s [ 18/574] leveldb-devel-0:1.23-9.fc40.a 100% | 1.0 MiB/s | 52.5 KiB | 00m00s [ 19/574] lmdb-devel-0:0.9.32-1.fc40.aa 100% | 735.2 KiB/s | 25.7 KiB | 00m00s [ 20/574] mesa-libGLU-devel-0:9.0.3-4.f 100% | 1.2 MiB/s | 12.0 KiB | 00m00s [ 21/574] magma-devel-0:2.8.0-20240328. 100% | 38.4 MiB/s | 903.7 KiB | 00m00s [ 22/574] gcc-c++-0:14.0.1-0.13.fc40.aa 100% | 63.0 MiB/s | 12.9 MiB | 00m00s [ 23/574] mpfr-devel-0:4.2.1-3.fc40.aar 100% | 1.5 MiB/s | 21.6 KiB | 00m00s [ 24/574] miniz-devel-0:3.0.2-5.fc40.aa 100% | 1.5 MiB/s | 32.5 KiB | 00m00s [ 25/574] numactl-devel-0:2.0.16-5.fc40 100% | 7.2 MiB/s | 22.0 KiB | 00m00s [ 26/574] neon2sse-devel-0:0-20230131.0 100% | 3.0 MiB/s | 84.7 KiB | 00m00s [ 27/574] nnpack-devel-0:0-20230201.0.g 100% | 508.7 KiB/s | 15.8 KiB | 00m00s [ 28/574] onnx-optimizer-devel-0:0.3.19 100% | 2.1 MiB/s | 50.5 KiB | 00m00s [ 29/574] ocl-icd-devel-0:2.3.2-5.fc40. 100% | 805.6 KiB/s | 58.0 KiB | 00m00s [ 30/574] openblas-devel-0:0.3.26-4.fc4 100% | 1.7 MiB/s | 82.3 KiB | 00m00s [ 31/574] protobuf-compat-compiler-0:3. 100% | 90.5 MiB/s | 833.9 KiB | 00m00s [ 32/574] peachpy-python3-0:0-20221113. 100% | 41.1 MiB/s | 674.0 KiB | 00m00s [ 33/574] protobuf-compat-devel-0:3.21. 100% | 52.2 MiB/s | 374.2 KiB | 00m00s [ 34/574] pybind11-devel-0:2.11.1-3.fc4 100% | 17.2 MiB/s | 176.1 KiB | 00m00s [ 35/574] python3-devel-0:3.12.2-2.fc40 100% | 12.7 MiB/s | 312.4 KiB | 00m00s [ 36/574] openblas-openmp-0:0.3.26-4.fc 100% | 42.6 MiB/s | 3.8 MiB | 00m00s [ 37/574] python3-pyyaml-0:6.0.1-14.fc4 100% | 12.9 MiB/s | 224.8 KiB | 00m00s [ 38/574] python3-pybind11-0:2.11.1-3.f 100% | 4.6 MiB/s | 197.5 KiB | 00m00s [ 39/574] python3-six-0:1.16.0-14.fc40. 100% | 4.4 MiB/s | 40.9 KiB | 00m00s [ 40/574] python3-setuptools-0:69.0.3-3 100% | 69.6 MiB/s | 1.5 MiB | 00m00s [ 41/574] python3-typing-extensions-0:4 100% | 4.4 MiB/s | 76.6 KiB | 00m00s [ 42/574] qnnpack-devel-0:0-20190828.2. 100% | 729.9 KiB/s | 12.4 KiB | 00m00s [ 43/574] rdma-core-devel-0:48.0-4.fc40 100% | 41.9 MiB/s | 429.0 KiB | 00m00s [ 44/574] snappy-devel-0:1.1.10-4.fc40. 100% | 4.3 MiB/s | 21.8 KiB | 00m00s [ 45/574] python3-numpy-1:1.26.4-1.fc40 100% | 54.5 MiB/s | 6.6 MiB | 00m00s [ 46/574] rocksdb-devel-0:8.10.0-3.fc40 100% | 7.9 MiB/s | 306.5 KiB | 00m00s [ 47/574] tensorpipe-devel-0:0-20220513 100% | 10.7 MiB/s | 109.4 KiB | 00m00s [ 48/574] zeromq-devel-0:4.3.5-16.fc40. 100% | 2.4 MiB/s | 17.0 KiB | 00m00s [ 49/574] asmjit-devel-1:0-20220702.1.g 100% | 16.0 MiB/s | 229.9 KiB | 00m00s [ 50/574] cpuinfo-devel-1:0-20240327.0. 100% | 1.9 MiB/s | 23.9 KiB | 00m00s [ 51/574] cuda-cudart-devel-12-3-0:12.3 100% | 167.0 MiB/s | 2.0 MiB | 00m00s [ 52/574] cuda-driver-devel-12-3-0:12.3 100% | 5.2 MiB/s | 42.7 KiB | 00m00s [ 53/574] tbb-devel-0:2021.11.0-5.fc40. 100% | 2.4 MiB/s | 240.0 KiB | 00m00s [ 54/574] cuda-cupti-12-3-0:12.3.101-1. 100% | 234.7 MiB/s | 14.8 MiB | 00m00s [ 55/574] cuda-nvml-devel-12-3-0:12.3.1 100% | 7.3 MiB/s | 119.5 KiB | 00m00s [ 56/574] cuda-nvtx-12-3-0:12.3.101-1.a 100% | 17.4 MiB/s | 89.0 KiB | 00m00s [ 57/574] cuda-profiler-api-12-3-0:12.3 100% | 12.6 MiB/s | 25.9 KiB | 00m00s [ 58/574] fxdiv-devel-1:0-20201208.1.gi 100% | 668.4 KiB/s | 12.0 KiB | 00m00s [ 59/574] gloo-devel-1:0.5.0-20240411.0 100% | 9.1 MiB/s | 74.6 KiB | 00m00s [ 60/574] cuda-nvrtc-devel-12-3-0:12.3. 100% | 305.6 MiB/s | 21.7 MiB | 00m00s [ 61/574] libcublas-devel-12-3-0:12.3.4 100% | 2.2 MiB/s | 74.9 KiB | 00m00s [ 62/574] libcudnn8-devel-0:8.9.7.29-2. 100% | 2.1 MiB/s | 33.6 KiB | 00m00s [ 63/574] libcufft-devel-12-3-0:11.0.12 100% | 1.4 MiB/s | 33.0 KiB | 00m00s [ 64/574] libcusolver-devel-12-3-0:11.5 100% | 5.4 MiB/s | 61.2 KiB | 00m00s [ 65/574] cuda-nvcc-12-3-0:12.3.107-1.a 100% | 224.2 MiB/s | 58.7 MiB | 00m00s [ 66/574] libnccl-devel-0:2.21.5-1+cuda 100% | 7.8 MiB/s | 16.0 KiB | 00m00s [ 67/574] libnvjitlink-devel-12-3-0:12. 100% | 271.7 MiB/s | 17.1 MiB | 00m00s [ 68/574] libcurand-devel-12-3-0:10.3.4 100% | 212.9 MiB/s | 53.2 MiB | 00m00s [ 69/574] onnx-devel-0:1.17.0-20240404. 100% | 1.8 MiB/s | 129.5 KiB | 00m00s [ 70/574] opencv-devel-0:4.9.0-20231227 100% | 42.6 MiB/s | 1.3 MiB | 00m00s [ 71/574] psimd-devel-1:0-20200517.2.gi 100% | 1.1 MiB/s | 13.0 KiB | 00m00s [ 72/574] pthreadpool-devel-1:0.1-20240 100% | 734.7 KiB/s | 14.7 KiB | 00m00s [ 73/574] sleef-devel-0:3.6-20240320.0. 100% | 1.2 MiB/s | 23.8 KiB | 00m00s [ 74/574] cmake-filesystem-0:3.28.2-1.f 100% | 8.6 MiB/s | 17.5 KiB | 00m00s [ 75/574] flatbuffers-0:23.5.26-6.fc40. 100% | 2.5 MiB/s | 188.3 KiB | 00m00s [ 76/574] graphviz-0:9.0.0-11.fc40.aarc 100% | 137.4 MiB/s | 4.9 MiB | 00m00s [ 77/574] perl-interpreter-4:5.38.2-506 100% | 35.3 MiB/s | 72.3 KiB | 00m00s [ 78/574] xapian-core-libs-0:1.4.23-2.f 100% | 140.8 MiB/s | 720.8 KiB | 00m00s [ 79/574] fftw-0:3.3.10-11.fc40.aarch64 100% | 9.9 MiB/s | 40.4 KiB | 00m00s [ 80/574] fftw-libs-0:3.3.10-11.fc40.aa 100% | 2.6 MiB/s | 8.0 KiB | 00m00s [ 81/574] foxi-0:0-20210526.1.gitc27858 100% | 1.2 MiB/s | 12.3 KiB | 00m00s [ 82/574] fp16-1:0-20240410.0.git581ac1 100% | 918.2 KiB/s | 11.9 KiB | 00m00s [ 83/574] gcc-0:14.0.1-0.13.fc40.aarch6 100% | 149.0 MiB/s | 34.0 MiB | 00m00s [ 84/574] libmpc-0:1.3.1-5.fc40.aarch64 100% | 23.6 MiB/s | 72.4 KiB | 00m00s [ 85/574] libstdc++-devel-0:14.0.1-0.13 100% | 151.6 MiB/s | 2.7 MiB | 00m00s [ 86/574] libcusparse-devel-12-3-0:12.2 100% | 133.1 MiB/s | 108.2 MiB | 00m01s [ 87/574] gflags-0:2.2.2-14.fc40.aarch6 100% | 711.3 KiB/s | 89.6 KiB | 00m00s [ 88/574] git-core-0:2.44.0-1.fc40.aarc 100% | 160.0 MiB/s | 4.6 MiB | 00m00s [ 89/574] git-core-doc-0:2.44.0-1.fc40. 100% | 87.6 MiB/s | 2.9 MiB | 00m00s [ 90/574] perl-File-Basename-0:2.86-506 100% | 1.9 MiB/s | 17.6 KiB | 00m00s [ 91/574] perl-File-Find-0:1.43-506.fc4 100% | 2.3 MiB/s | 25.7 KiB | 00m00s [ 92/574] perl-Getopt-Long-1:2.57-3.fc4 100% | 5.1 MiB/s | 63.2 KiB | 00m00s [ 93/574] perl-Git-0:2.44.0-1.fc40.noar 100% | 5.6 MiB/s | 40.0 KiB | 00m00s [ 94/574] perl-PathTools-0:3.89-502.fc4 100% | 10.7 MiB/s | 87.5 KiB | 00m00s [ 95/574] perl-TermReadKey-0:2.38-21.fc 100% | 5.8 MiB/s | 35.5 KiB | 00m00s [ 96/574] perl-IPC-Open3-0:1.22-506.fc4 100% | 1.3 MiB/s | 22.3 KiB | 00m00s [ 97/574] perl-lib-0:0.65-506.fc40.aarc 100% | 5.0 MiB/s | 15.4 KiB | 00m00s [ 98/574] perl-libs-4:5.38.2-506.fc40.a 100% | 101.3 MiB/s | 2.3 MiB | 00m00s [ 99/574] gmp-c++-1:6.2.1-8.fc40.aarch6 100% | 6.0 MiB/s | 18.4 KiB | 00m00s [100/574] hiredis-0:1.0.2-7.fc40.aarch6 100% | 5.8 MiB/s | 41.8 KiB | 00m00s [101/574] glog-0:0.3.5-20.fc40.aarch64 100% | 1.5 MiB/s | 66.1 KiB | 00m00s [102/574] kineto-0:0.4.0-20240327.0.git 100% | 18.4 MiB/s | 282.2 KiB | 00m00s [103/574] libuv-1:1.48.0-1.fc40.aarch64 100% | 60.9 MiB/s | 249.5 KiB | 00m00s [104/574] libuv-static-1:1.48.0-1.fc40. 100% | 7.0 MiB/s | 107.5 KiB | 00m00s [105/574] leveldb-0:1.23-9.fc40.aarch64 100% | 2.5 MiB/s | 147.9 KiB | 00m00s [106/574] lmdb-libs-0:0.9.32-1.fc40.aar 100% | 19.9 MiB/s | 61.3 KiB | 00m00s [107/574] lmdb-0:0.9.32-1.fc40.aarch64 100% | 535.9 KiB/s | 32.7 KiB | 00m00s [108/574] gl-manpages-0:1.1-31.20190306 100% | 47.6 MiB/s | 1.2 MiB | 00m00s [109/574] libglvnd-devel-1:1.7.0-4.fc40 100% | 39.7 MiB/s | 162.6 KiB | 00m00s [110/574] mesa-libGLU-0:9.0.3-4.fc40.aa 100% | 37.5 MiB/s | 153.5 KiB | 00m00s [111/574] miniz-0:3.0.2-5.fc40.aarch64 100% | 1.4 MiB/s | 66.0 KiB | 00m00s [112/574] nnpack-0:0-20230201.0.git70a7 100% | 5.7 MiB/s | 81.2 KiB | 00m00s [113/574] numactl-libs-0:2.0.16-5.fc40. 100% | 7.5 MiB/s | 30.5 KiB | 00m00s [114/574] ocl-icd-0:2.3.2-5.fc40.aarch6 100% | 19.6 MiB/s | 60.2 KiB | 00m00s [115/574] opencl-headers-0:3.0-21.20231 100% | 1.9 MiB/s | 88.8 KiB | 00m00s [116/574] onnx-optimizer-0:0.3.19-20240 100% | 6.6 MiB/s | 189.0 KiB | 00m00s [117/574] openblas-0:0.3.26-4.fc40.aarc 100% | 18.9 MiB/s | 38.6 KiB | 00m00s [118/574] openblas-openmp64-0:0.3.26-4. 100% | 87.2 MiB/s | 3.8 MiB | 00m00s [119/574] openblas-openmp64_-0:0.3.26-4 100% | 45.1 MiB/s | 3.7 MiB | 00m00s [120/574] cutlass-0:3.4.1-20240215.0.cu 100% | 145.6 MiB/s | 179.0 MiB | 00m01s [121/574] openblas-serial-0:0.3.26-4.fc 100% | 15.5 MiB/s | 3.7 MiB | 00m00s [122/574] openblas-serial64-0:0.3.26-4. 100% | 50.2 MiB/s | 3.6 MiB | 00m00s [123/574] openblas-serial64_-0:0.3.26-4 100% | 18.8 MiB/s | 3.6 MiB | 00m00s [124/574] openblas-threads-0:0.3.26-4.f 100% | 17.9 MiB/s | 3.8 MiB | 00m00s [125/574] openblas-threads64-0:0.3.26-4 100% | 45.7 MiB/s | 3.8 MiB | 00m00s [126/574] magma-0:2.8.0-20240328.0.cu12 100% | 122.6 MiB/s | 118.9 MiB | 00m01s [127/574] python3-0:3.12.2-2.fc40.aarch 100% | 13.3 MiB/s | 27.2 KiB | 00m00s [128/574] libgfortran-0:14.0.1-0.13.fc4 100% | 3.8 MiB/s | 451.9 KiB | 00m00s [129/574] zlib-ng-compat-devel-0:2.1.6- 100% | 17.6 MiB/s | 36.1 KiB | 00m00s [130/574] openblas-threads64_-0:0.3.26- 100% | 22.5 MiB/s | 3.7 MiB | 00m00s [131/574] protobuf-compat-0:3.21.9-2.fc 100% | 60.4 MiB/s | 989.0 KiB | 00m00s [132/574] python3-libs-0:3.12.2-2.fc40. 100% | 152.6 MiB/s | 9.2 MiB | 00m00s [133/574] flexiblas-netlib-0:3.4.2-1.fc 100% | 38.8 MiB/s | 2.7 MiB | 00m00s [134/574] cmake-0:3.28.2-1.fc40.aarch64 100% | 79.3 MiB/s | 7.7 MiB | 00m00s [135/574] libyaml-0:0.2.5-14.fc40.aarch 100% | 1.7 MiB/s | 59.6 KiB | 00m00s [136/574] qnnpack-0:0-20190828.2.git7d2 100% | 1.7 MiB/s | 42.7 KiB | 00m00s [137/574] infiniband-diags-0:48.0-4.fc4 100% | 27.3 MiB/s | 335.7 KiB | 00m00s [138/574] libibverbs-0:48.0-4.fc40.aarc 100% | 42.6 MiB/s | 436.3 KiB | 00m00s [139/574] libibumad-0:48.0-4.fc40.aarch 100% | 647.6 KiB/s | 26.5 KiB | 00m00s [140/574] librdmacm-0:48.0-4.fc40.aarch 100% | 1.8 MiB/s | 72.8 KiB | 00m00s [141/574] snappy-0:1.1.10-4.fc40.aarch6 100% | 5.2 MiB/s | 36.9 KiB | 00m00s [142/574] tbb-0:2021.11.0-5.fc40.aarch6 100% | 14.6 MiB/s | 134.9 KiB | 00m00s [143/574] tensorpipe-0:0-20220513.1.git 100% | 48.2 MiB/s | 740.1 KiB | 00m00s [144/574] krb5-devel-0:1.21.2-5.fc40.aa 100% | 12.8 MiB/s | 144.0 KiB | 00m00s [145/574] tbb-bind-0:2021.11.0-5.fc40.a 100% | 561.7 KiB/s | 18.5 KiB | 00m00s [146/574] libunwind-devel-0:1.8.0-3.fc4 100% | 14.6 MiB/s | 104.6 KiB | 00m00s [147/574] rocksdb-0:8.10.0-3.fc40.aarch 100% | 32.6 MiB/s | 2.9 MiB | 00m00s [148/574] openpgm-devel-0:5.2.122-34.fc 100% | 7.3 MiB/s | 66.9 KiB | 00m00s [149/574] zeromq-0:4.3.5-16.fc40.aarch6 100% | 74.7 MiB/s | 458.7 KiB | 00m00s [150/574] libsodium-devel-0:1.0.19-4.fc 100% | 37.3 MiB/s | 1.1 MiB | 00m00s [151/574] cuda-cudart-12-3-0:12.3.101-1 100% | 32.5 MiB/s | 233.3 KiB | 00m00s [152/574] cuda-crt-12-3-0:12.3.107-1.aa 100% | 21.7 MiB/s | 111.2 KiB | 00m00s [153/574] cpuinfo-1:0-20240327.0.gitf42 100% | 2.0 MiB/s | 47.6 KiB | 00m00s [154/574] asmjit-1:0-20220702.1.gitc598 100% | 5.9 MiB/s | 203.9 KiB | 00m00s [155/574] gloo-1:0.5.0-20240411.0.git6c 100% | 27.0 MiB/s | 747.6 KiB | 00m00s [156/574] cuda-nvvm-12-3-0:12.3.107-1.a 100% | 281.4 MiB/s | 25.0 MiB | 00m00s [157/574] cuda-nvrtc-12-3-0:12.3.107-1. 100% | 210.5 MiB/s | 23.2 MiB | 00m00s [158/574] libnvjitlink-12-3-0:12.3.101- 100% | 264.9 MiB/s | 19.1 MiB | 00m00s [159/574] onnx-libs-0:1.17.0-20240404.0 100% | 76.4 MiB/s | 781.9 KiB | 00m00s [160/574] opencv-0:4.9.0-20231227.1.cu1 100% | 90.5 MiB/s | 4.2 MiB | 00m00s [161/574] libcurand-12-3-0:10.3.4.107-1 100% | 179.9 MiB/s | 52.9 MiB | 00m00s [162/574] opencv-contrib-0:4.9.0-202312 100% | 38.1 MiB/s | 5.5 MiB | 00m00s [163/574] opencv-static-0:4.9.0-2023122 100% | 13.1 MiB/s | 390.4 KiB | 00m00s [164/574] pthreadpool-1:0.1-20240121.0. 100% | 1.4 MiB/s | 33.7 KiB | 00m00s [165/574] sleef-0:3.6-20240320.0.git60e 100% | 18.1 MiB/s | 482.9 KiB | 00m00s [166/574] cairo-0:1.18.0-3.fc40.aarch64 100% | 114.1 MiB/s | 701.3 KiB | 00m00s [167/574] expat-0:2.6.0-1.fc40.aarch64 100% | 13.5 MiB/s | 110.5 KiB | 00m00s [168/574] fontconfig-0:2.15.0-4.fc40.aa 100% | 53.6 MiB/s | 274.7 KiB | 00m00s [169/574] freetype-0:2.13.2-5.fc40.aarc 100% | 66.1 MiB/s | 406.1 KiB | 00m00s [170/574] gd-0:2.3.3-16.fc40.aarch64 100% | 43.5 MiB/s | 133.6 KiB | 00m00s [171/574] gdk-pixbuf2-0:2.42.10-8.fc40. 100% | 94.3 MiB/s | 483.0 KiB | 00m00s [172/574] glib2-0:2.80.0-1.fc40.aarch64 100% | 215.8 MiB/s | 3.0 MiB | 00m00s [173/574] gts-0:0.7.6-48.20121130.fc40. 100% | 77.3 MiB/s | 237.6 KiB | 00m00s [174/574] harfbuzz-0:8.3.0-5.fc40.aarch 100% | 64.2 MiB/s | 985.9 KiB | 00m00s [175/574] lasi-0:1.1.3-13.fc40.aarch64 100% | 8.8 MiB/s | 53.8 KiB | 00m00s [176/574] libX11-0:1.8.7-3.fc40.aarch64 100% | 90.2 MiB/s | 646.9 KiB | 00m00s [177/574] libXrender-0:0.9.11-6.fc40.aa 100% | 13.2 MiB/s | 27.0 KiB | 00m00s [178/574] libgs-0:10.02.1-8.fc40.aarch6 100% | 123.4 MiB/s | 3.5 MiB | 00m00s [179/574] librsvg2-0:2.57.1-4.fc40.aarc 100% | 39.8 MiB/s | 1.5 MiB | 00m00s [180/574] libwebp-0:1.3.2-5.fc40.aarch6 100% | 60.2 MiB/s | 246.7 KiB | 00m00s [181/574] pango-0:1.51.2-1.fc40.aarch64 100% | 83.8 MiB/s | 343.1 KiB | 00m00s [182/574] poppler-glib-0:24.02.0-2.fc40 100% | 59.6 MiB/s | 183.2 KiB | 00m00s [183/574] urw-base35-fonts-0:20200910-1 100% | 4.9 MiB/s | 10.0 KiB | 00m00s [184/574] fftw-libs-double-0:3.3.10-11. 100% | 162.3 MiB/s | 830.7 KiB | 00m00s [185/574] fftw-libs-long-0:3.3.10-11.fc 100% | 58.9 MiB/s | 784.7 KiB | 00m00s [186/574] opencv-cuda-0:4.9.0-20231227. 100% | 99.6 MiB/s | 36.8 MiB | 00m00s [187/574] fftw-libs-single-0:3.3.10-11. 100% | 22.5 MiB/s | 874.4 KiB | 00m00s [188/574] libasan-0:14.0.1-0.13.fc40.aa 100% | 43.8 MiB/s | 493.0 KiB | 00m00s [189/574] libatomic-0:14.0.1-0.13.fc40. 100% | 3.4 MiB/s | 34.3 KiB | 00m00s [190/574] libubsan-0:14.0.1-0.13.fc40.a 100% | 22.6 MiB/s | 208.3 KiB | 00m00s [191/574] make-1:4.4.1-6.fc40.aarch64 100% | 47.8 MiB/s | 587.7 KiB | 00m00s [192/574] less-0:643-4.fc40.aarch64 100% | 15.8 MiB/s | 177.4 KiB | 00m00s [193/574] openssh-clients-0:9.6p1-1.fc4 100% | 31.8 MiB/s | 748.8 KiB | 00m00s [194/574] perl-Carp-0:1.54-502.fc40.noa 100% | 819.8 KiB/s | 28.7 KiB | 00m00s [195/574] perl-Exporter-0:5.78-3.fc40.n 100% | 1.5 MiB/s | 30.8 KiB | 00m00s [196/574] cpp-0:14.0.1-0.13.fc40.aarch6 100% | 67.5 MiB/s | 10.7 MiB | 00m00s [197/574] perl-Text-ParseWords-0:3.31-5 100% | 775.7 KiB/s | 16.3 KiB | 00m00s [198/574] perl-base-0:2.27-506.fc40.noa 100% | 1.4 MiB/s | 16.6 KiB | 00m00s [199/574] perl-constant-0:1.33-503.fc40 100% | 2.2 MiB/s | 22.8 KiB | 00m00s [200/574] perl-Error-1:0.17029-15.fc40. 100% | 19.7 MiB/s | 40.4 KiB | 00m00s [201/574] perl-overload-0:1.37-506.fc40 100% | 9.0 MiB/s | 46.0 KiB | 00m00s [202/574] perl-Fcntl-0:1.15-506.fc40.aa 100% | 5.2 MiB/s | 21.2 KiB | 00m00s [203/574] perl-POSIX-0:2.13-506.fc40.aa 100% | 47.8 MiB/s | 97.9 KiB | 00m00s [204/574] perl-IO-0:1.52-506.fc40.aarch 100% | 16.2 MiB/s | 82.9 KiB | 00m00s [205/574] perl-Symbol-0:1.09-506.fc40.n 100% | 7.2 MiB/s | 14.6 KiB | 00m00s [206/574] perl-Errno-0:1.37-506.fc40.aa 100% | 3.0 MiB/s | 15.4 KiB | 00m00s [207/574] perl-Scalar-List-Utils-5:1.63 100% | 14.0 MiB/s | 71.5 KiB | 00m00s [208/574] perl-DynaLoader-0:1.54-506.fc 100% | 12.9 MiB/s | 26.5 KiB | 00m00s [209/574] perl-vars-0:1.05-506.fc40.noa 100% | 2.6 MiB/s | 13.4 KiB | 00m00s [210/574] perl-Encode-4:3.21-505.fc40.a 100% | 186.7 MiB/s | 1.7 MiB | 00m00s [211/574] libX11-devel-0:1.8.7-3.fc40.a 100% | 78.3 MiB/s | 1.0 MiB | 00m00s [212/574] libglvnd-1:1.7.0-4.fc40.aarch 100% | 13.3 MiB/s | 122.3 KiB | 00m00s [213/574] libglvnd-core-devel-1:1.7.0-4 100% | 4.2 MiB/s | 17.4 KiB | 00m00s [214/574] libglvnd-egl-1:1.7.0-4.fc40.a 100% | 9.0 MiB/s | 36.8 KiB | 00m00s [215/574] libglvnd-glx-1:1.7.0-4.fc40.a 100% | 44.6 MiB/s | 137.0 KiB | 00m00s [216/574] libglvnd-gles-1:1.7.0-4.fc40. 100% | 5.2 MiB/s | 32.1 KiB | 00m00s [217/574] libglvnd-opengl-1:1.7.0-4.fc4 100% | 8.6 MiB/s | 44.1 KiB | 00m00s [218/574] jsoncpp-0:1.9.5-7.fc40.aarch6 100% | 29.8 MiB/s | 91.4 KiB | 00m00s [219/574] cmake-data-0:3.28.2-1.fc40.no 100% | 174.5 MiB/s | 2.3 MiB | 00m00s [220/574] rhash-0:1.4.3-4.fc40.aarch64 100% | 23.6 MiB/s | 193.6 KiB | 00m00s [221/574] libb2-0:0.98.1-11.fc40.aarch6 100% | 5.9 MiB/s | 24.3 KiB | 00m00s [222/574] mpdecimal-0:2.5.1-9.fc40.aarc 100% | 28.9 MiB/s | 88.8 KiB | 00m00s [223/574] python-pip-wheel-0:23.3.2-1.f 100% | 163.4 MiB/s | 1.5 MiB | 00m00s [224/574] tzdata-0:2024a-4.fc40.noarch 100% | 53.8 MiB/s | 716.2 KiB | 00m00s [225/574] flexiblas-0:3.4.2-1.fc40.aarc 100% | 2.2 MiB/s | 25.1 KiB | 00m00s [226/574] flexiblas-openblas-openmp-0:3 100% | 1.8 MiB/s | 16.8 KiB | 00m00s [227/574] libnl3-0:3.9.0-3.fc40.aarch64 100% | 56.4 MiB/s | 346.7 KiB | 00m00s [228/574] perl-Getopt-Std-0:1.13-506.fc 100% | 1.3 MiB/s | 16.1 KiB | 00m00s [229/574] liburing-0:2.5-3.fc40.aarch64 100% | 5.6 MiB/s | 39.9 KiB | 00m00s [230/574] hwloc-libs-0:2.10.0-3.fc40.aa 100% | 83.0 MiB/s | 2.1 MiB | 00m00s [231/574] keyutils-libs-devel-0:1.6.3-3 100% | 2.0 MiB/s | 60.3 KiB | 00m00s [232/574] libcom_err-devel-0:1.47.0-5.f 100% | 1.5 MiB/s | 15.0 KiB | 00m00s [233/574] libselinux-devel-0:3.6-4.fc40 100% | 73.7 MiB/s | 150.9 KiB | 00m00s [234/574] libkadm5-0:1.21.2-5.fc40.aarc 100% | 19.5 MiB/s | 79.9 KiB | 00m00s [235/574] libverto-devel-0:0.3.2-8.fc40 100% | 2.3 MiB/s | 14.2 KiB | 00m00s [236/574] libsodium-0:1.0.19-4.fc40.aar 100% | 21.5 MiB/s | 132.4 KiB | 00m00s [237/574] libunwind-0:1.8.0-3.fc40.aarc 100% | 19.5 MiB/s | 80.0 KiB | 00m00s [238/574] openpgm-0:5.2.122-34.fc40.aar 100% | 33.9 MiB/s | 173.3 KiB | 00m00s [239/574] coin-or-CoinUtils-0:2.11.10-1 100% | 74.8 MiB/s | 459.8 KiB | 00m00s [240/574] coin-or-Clp-0:1.17.9-1.fc40.a 100% | 62.9 MiB/s | 901.7 KiB | 00m00s [241/574] gstreamer1-0:1.22.9-1.fc40.aa 100% | 107.4 MiB/s | 1.4 MiB | 00m00s [242/574] gstreamer1-plugins-base-0:1.2 100% | 93.0 MiB/s | 2.1 MiB | 00m00s [243/574] libavformat-free-0:6.1.1-8.fc 100% | 55.0 MiB/s | 1.1 MiB | 00m00s [244/574] libavcodec-free-0:6.1.1-8.fc4 100% | 76.4 MiB/s | 4.0 MiB | 00m00s [245/574] libavutil-free-0:6.1.1-8.fc40 100% | 14.2 MiB/s | 348.5 KiB | 00m00s [246/574] libdc1394-0:2.2.7-5.fc40.aarc 100% | 11.6 MiB/s | 130.9 KiB | 00m00s [247/574] libjpeg-turbo-0:3.0.2-1.fc40. 100% | 42.6 MiB/s | 261.6 KiB | 00m00s [248/574] libpng-2:1.6.40-3.fc40.aarch6 100% | 9.4 MiB/s | 116.0 KiB | 00m00s [249/574] libswscale-free-0:6.1.1-8.fc4 100% | 16.6 MiB/s | 169.5 KiB | 00m00s [250/574] libtiff-0:4.6.0-2.fc40.aarch6 100% | 108.3 MiB/s | 332.7 KiB | 00m00s [251/574] openjpeg2-0:2.5.2-1.fc40.aarc 100% | 90.3 MiB/s | 185.0 KiB | 00m00s [252/574] openexr-libs-0:3.1.10-5.fc40. 100% | 81.7 MiB/s | 1.1 MiB | 00m00s [253/574] qt5-qtbase-0:5.15.13-1.fc40.a 100% | 28.9 MiB/s | 3.5 MiB | 00m00s [254/574] opencv-core-0:4.9.0-20231227. 100% | 65.2 MiB/s | 8.9 MiB | 00m00s [255/574] qt5-qtbase-gui-0:5.15.13-1.fc 100% | 24.1 MiB/s | 6.4 MiB | 00m00s [256/574] hdf5-0:1.12.1-15.fc40.aarch64 100% | 73.3 MiB/s | 2.1 MiB | 00m00s [257/574] libXext-0:1.3.6-1.fc40.aarch6 100% | 18.9 MiB/s | 38.7 KiB | 00m00s [258/574] libxcb-0:1.16-4.fc40.aarch64 100% | 59.9 MiB/s | 245.5 KiB | 00m00s [259/574] pixman-0:0.43.0-3.fc40.aarch6 100% | 30.5 MiB/s | 218.7 KiB | 00m00s [260/574] default-fonts-core-sans-0:4.0 100% | 15.5 MiB/s | 31.7 KiB | 00m00s [261/574] fonts-filesystem-1:2.0.5-14.f 100% | 1.3 MiB/s | 8.2 KiB | 00m00s [262/574] xml-common-0:0.6.3-63.fc40.no 100% | 6.1 MiB/s | 31.0 KiB | 00m00s [263/574] libXpm-0:3.5.17-3.fc40.aarch6 100% | 31.4 MiB/s | 64.2 KiB | 00m00s [264/574] libavif-0:1.0.4-1.fc40.aarch6 100% | 21.8 MiB/s | 89.1 KiB | 00m00s [265/574] libimagequant-0:4.0.3-3.fc40. 100% | 47.8 MiB/s | 293.6 KiB | 00m00s [266/574] shared-mime-info-0:2.3-4.fc40 100% | 126.5 MiB/s | 388.7 KiB | 00m00s [267/574] ceres-solver-0:2.2.0-4.fc40.a 100% | 12.0 MiB/s | 1.1 MiB | 00m00s [268/574] netpbm-0:11.02.00-6.fc40.aarc 100% | 45.0 MiB/s | 184.4 KiB | 00m00s [269/574] gnutls-0:3.8.3-2.fc40.aarch64 100% | 70.1 MiB/s | 1.1 MiB | 00m00s [270/574] graphite2-0:1.3.14-15.fc40.aa 100% | 10.0 MiB/s | 92.1 KiB | 00m00s [271/574] libX11-common-0:1.8.7-3.fc40. 100% | 24.5 MiB/s | 175.9 KiB | 00m00s [272/574] adobe-mappings-cmap-0:2023062 100% | 142.0 MiB/s | 2.1 MiB | 00m00s [273/574] adobe-mappings-cmap-deprecate 100% | 10.1 MiB/s | 114.0 KiB | 00m00s [274/574] adobe-mappings-pdf-0:20190401 100% | 40.0 MiB/s | 695.9 KiB | 00m00s [275/574] cups-libs-1:2.4.7-11.fc40.aar 100% | 13.8 MiB/s | 268.3 KiB | 00m00s [276/574] jbig2dec-libs-0:0.20-4.fc40.a 100% | 11.7 MiB/s | 72.1 KiB | 00m00s [277/574] lcms2-0:2.16-3.fc40.aarch64 100% | 29.9 MiB/s | 183.7 KiB | 00m00s [278/574] libXt-0:1.3.0-3.fc40.aarch64 100% | 21.6 MiB/s | 176.8 KiB | 00m00s [279/574] libijs-0:0.35-22.fc40.aarch64 100% | 4.1 MiB/s | 29.3 KiB | 00m00s [280/574] google-droid-sans-fonts-0:202 100% | 75.2 MiB/s | 2.7 MiB | 00m00s [281/574] libpaper-1:2.1.1-3.fc40.aarch 100% | 2.9 MiB/s | 27.0 KiB | 00m00s [282/574] cairo-gobject-0:1.18.0-3.fc40 100% | 2.6 MiB/s | 18.6 KiB | 00m00s [283/574] rsvg-pixbuf-loader-0:2.57.1-4 100% | 3.2 MiB/s | 16.3 KiB | 00m00s [284/574] fribidi-0:1.0.13-4.fc40.aarch 100% | 22.4 MiB/s | 91.6 KiB | 00m00s [285/574] libXft-0:2.3.8-6.fc40.aarch64 100% | 13.9 MiB/s | 71.3 KiB | 00m00s [286/574] libthai-0:0.1.29-8.fc40.aarch 100% | 29.7 MiB/s | 213.2 KiB | 00m00s [287/574] urw-base35-bookman-fonts-0:20 100% | 59.1 MiB/s | 846.9 KiB | 00m00s [288/574] poppler-0:24.02.0-2.fc40.aarc 100% | 57.7 MiB/s | 1.2 MiB | 00m00s [289/574] urw-base35-c059-fonts-0:20200 100% | 213.4 MiB/s | 874.0 KiB | 00m00s [290/574] urw-base35-d050000l-fonts-0:2 100% | 24.6 MiB/s | 75.7 KiB | 00m00s [291/574] urw-base35-fonts-common-0:202 100% | 6.8 MiB/s | 20.8 KiB | 00m00s [292/574] urw-base35-gothic-fonts-0:202 100% | 125.5 MiB/s | 642.5 KiB | 00m00s [293/574] urw-base35-nimbus-mono-ps-fon 100% | 45.6 MiB/s | 794.6 KiB | 00m00s [294/574] urw-base35-nimbus-roman-fonts 100% | 55.7 MiB/s | 855.9 KiB | 00m00s [295/574] urw-base35-nimbus-sans-fonts- 100% | 108.8 MiB/s | 1.3 MiB | 00m00s [296/574] urw-base35-p052-fonts-0:20200 100% | 59.4 MiB/s | 973.2 KiB | 00m00s [297/574] urw-base35-standard-symbols-p 100% | 8.1 MiB/s | 41.5 KiB | 00m00s [298/574] urw-base35-z003-fonts-0:20200 100% | 38.4 MiB/s | 275.5 KiB | 00m00s [299/574] libedit-0:3.1-50.20230828cvs. 100% | 26.2 MiB/s | 107.2 KiB | 00m00s [300/574] libfido2-0:1.14.0-4.fc40.aarc 100% | 18.7 MiB/s | 95.8 KiB | 00m00s [301/574] openssh-0:9.6p1-1.fc40.2.aarc 100% | 83.2 MiB/s | 425.9 KiB | 00m00s [302/574] perl-mro-0:1.28-506.fc40.aarc 100% | 3.1 MiB/s | 29.0 KiB | 00m00s [303/574] perl-overloading-0:0.02-506.f 100% | 702.6 KiB/s | 13.3 KiB | 00m00s [304/574] perl-File-stat-0:1.13-506.fc4 100% | 2.5 MiB/s | 17.6 KiB | 00m00s [305/574] perl-SelectSaver-0:1.02-506.f 100% | 1.5 MiB/s | 12.2 KiB | 00m00s [306/574] guile30-0:3.0.7-12.fc40.aarch 100% | 105.9 MiB/s | 8.2 MiB | 00m00s [307/574] perl-Socket-4:2.037-5.fc40.aa 100% | 4.9 MiB/s | 55.7 KiB | 00m00s [308/574] perl-locale-0:1.10-506.fc40.n 100% | 6.9 MiB/s | 14.1 KiB | 00m00s [309/574] perl-MIME-Base64-0:3.16-503.f 100% | 9.7 MiB/s | 29.9 KiB | 00m00s [310/574] perl-Storable-1:3.32-502.fc40 100% | 11.9 MiB/s | 97.4 KiB | 00m00s [311/574] perl-parent-1:0.241-502.fc40. 100% | 2.4 MiB/s | 14.7 KiB | 00m00s [312/574] libX11-xcb-0:1.8.7-3.fc40.aar 100% | 1.9 MiB/s | 12.0 KiB | 00m00s [313/574] libxcb-devel-0:1.16-4.fc40.aa 100% | 159.3 MiB/s | 1.4 MiB | 00m00s [314/574] xorg-x11-proto-devel-0:2023.2 100% | 41.5 MiB/s | 297.6 KiB | 00m00s [315/574] mesa-libEGL-0:24.0.4-1.fc40.a 100% | 32.9 MiB/s | 134.9 KiB | 00m00s [316/574] emacs-filesystem-1:29.2-3.fc4 100% | 2.5 MiB/s | 7.8 KiB | 00m00s [317/574] mesa-libGL-0:24.0.4-1.fc40.aa 100% | 46.1 MiB/s | 189.0 KiB | 00m00s [318/574] vim-filesystem-2:9.1.158-1.fc 100% | 3.4 MiB/s | 17.5 KiB | 00m00s [319/574] libsepol-devel-0:3.6-3.fc40.a 100% | 7.9 MiB/s | 48.7 KiB | 00m00s [320/574] pcre2-devel-0:10.42-2.fc40.2. 100% | 164.2 MiB/s | 504.6 KiB | 00m00s [321/574] asl-0:20240106-1.20240201git2 100% | 78.4 MiB/s | 481.7 KiB | 00m00s [322/574] MUMPS-0:5.6.2-3.fc40.aarch64 100% | 118.3 MiB/s | 1.9 MiB | 00m00s [323/574] coin-or-Osi-0:0.108.9-2.fc40. 100% | 143.1 MiB/s | 1.9 MiB | 00m00s [324/574] libnccl-0:2.21.5-1+cuda12.4.a 100% | 69.7 MiB/s | 130.0 MiB | 00m02s [325/574] coin-or-Cbc-0:2.10.11-2.fc40. 100% | 4.2 MiB/s | 789.0 KiB | 00m00s [326/574] glpk-0:5.0-11.fc40.aarch64 100% | 8.2 MiB/s | 360.2 KiB | 00m00s [327/574] alsa-lib-0:1.2.11-2.fc40.aarc 100% | 10.2 MiB/s | 511.6 KiB | 00m00s [328/574] graphene-0:1.10.6-8.fc40.aarc 100% | 3.9 MiB/s | 63.4 KiB | 00m00s [329/574] cdparanoia-libs-0:10.2-44.fc4 100% | 2.2 MiB/s | 53.8 KiB | 00m00s [330/574] libXi-0:1.8.1-5.fc40.aarch64 100% | 2.0 MiB/s | 39.6 KiB | 00m00s [331/574] libXv-0:1.0.12-3.fc40.aarch64 100% | 581.7 KiB/s | 18.6 KiB | 00m00s [332/574] libdrm-0:2.4.120-3.fc40.aarch 100% | 7.5 MiB/s | 130.7 KiB | 00m00s [333/574] libgudev-0:238-5.fc40.aarch64 100% | 1.2 MiB/s | 33.9 KiB | 00m00s [334/574] libogg-2:1.3.5-8.fc40.aarch64 100% | 643.8 KiB/s | 32.8 KiB | 00m00s [335/574] libtheora-1:1.1.1-36.fc40.aar 100% | 22.7 MiB/s | 163.0 KiB | 00m00s [336/574] iso-codes-0:4.16.0-3.fc40.noa 100% | 21.6 MiB/s | 3.5 MiB | 00m00s [337/574] libvisual-1:0.4.1-4.fc40.aarc 100% | 20.2 MiB/s | 145.1 KiB | 00m00s [338/574] libwayland-client-0:1.22.0-3. 100% | 4.0 MiB/s | 32.8 KiB | 00m00s [339/574] libvorbis-1:1.3.7-10.fc40.aar 100% | 11.7 MiB/s | 191.8 KiB | 00m00s [340/574] libwayland-cursor-0:1.22.0-3. 100% | 2.6 MiB/s | 18.9 KiB | 00m00s [341/574] libwayland-egl-0:1.22.0-3.fc4 100% | 3.1 MiB/s | 12.6 KiB | 00m00s [342/574] mesa-libgbm-0:24.0.4-1.fc40.a 100% | 9.4 MiB/s | 48.0 KiB | 00m00s [343/574] opus-0:1.5.1-1.fc40.aarch64 100% | 44.4 MiB/s | 227.4 KiB | 00m00s [344/574] orc-0:0.4.38-2.fc40.aarch64 100% | 20.1 MiB/s | 226.5 KiB | 00m00s [345/574] fdk-aac-free-0:2.0.0-13.fc40. 100% | 32.1 MiB/s | 328.7 KiB | 00m00s [346/574] suitesparse-0:7.6.0-1.fc40.aa 100% | 42.7 MiB/s | 19.4 MiB | 00m00s [347/574] codec2-0:1.2.0-4.fc40.aarch64 100% | 14.1 MiB/s | 637.0 KiB | 00m00s [348/574] gsm-0:1.0.22-6.fc40.aarch64 100% | 1.5 MiB/s | 36.1 KiB | 00m00s [349/574] ilbc-0:3.0.4-10.fc40.aarch64 100% | 6.4 MiB/s | 52.2 KiB | 00m00s [350/574] lame-libs-0:3.100-17.fc40.aar 100% | 55.0 MiB/s | 337.7 KiB | 00m00s [351/574] libdav1d-0:1.4.0-1.fc40.aarch 100% | 34.6 MiB/s | 354.7 KiB | 00m00s [352/574] libaom-0:3.8.2-1.fc40.aarch64 100% | 80.9 MiB/s | 1.5 MiB | 00m00s [353/574] libjxl-1:0.8.2-6.fc40.aarch64 100% | 51.5 MiB/s | 791.5 KiB | 00m00s [354/574] libswresample-free-0:6.1.1-8. 100% | 7.9 MiB/s | 64.9 KiB | 00m00s [355/574] libva-0:2.21.0-3.fc40.aarch64 100% | 11.9 MiB/s | 109.6 KiB | 00m00s [356/574] opencore-amr-0:0.1.6-6.fc40.a 100% | 24.4 MiB/s | 174.9 KiB | 00m00s [357/574] speex-0:1.2.0-17.fc40.aarch64 100% | 10.5 MiB/s | 64.6 KiB | 00m00s [358/574] libvpx-0:1.14.0-1.fc40.aarch6 100% | 72.0 MiB/s | 1.2 MiB | 00m00s [359/574] rav1e-libs-0:0.7.1-1.fc40.aar 100% | 59.9 MiB/s | 797.9 KiB | 00m00s [360/574] twolame-libs-0:0.4.0-4.fc40.a 100% | 16.8 MiB/s | 68.8 KiB | 00m00s [361/574] svt-av1-libs-0:1.4.1-5.fc40.a 100% | 105.4 MiB/s | 1.1 MiB | 00m00s [362/574] xvidcore-0:1.3.7-11.fc40.aarc 100% | 32.0 MiB/s | 229.3 KiB | 00m00s [363/574] zvbi-0:0.2.35-22.fc40.aarch64 100% | 82.0 MiB/s | 419.7 KiB | 00m00s [364/574] vo-amrwbenc-0:0.1.3-20.fc40.a 100% | 5.4 MiB/s | 76.7 KiB | 00m00s [365/574] libbluray-0:1.3.4-5.fc40.aarc 100% | 20.3 MiB/s | 166.0 KiB | 00m00s [366/574] libchromaprint-0:1.5.1-17.fc4 100% | 9.9 MiB/s | 40.4 KiB | 00m00s [367/574] game-music-emu-0:0.6.3-14.fc4 100% | 14.8 MiB/s | 151.3 KiB | 00m00s [368/574] libmodplug-1:0.8.9.0-19.fc40. 100% | 13.9 MiB/s | 171.4 KiB | 00m00s [369/574] libgcrypt-0:1.10.3-3.fc40.aar 100% | 31.7 MiB/s | 454.7 KiB | 00m00s [370/574] libopenmpt-0:0.7.3-3.fc40.aar 100% | 46.0 MiB/s | 659.5 KiB | 00m00s [371/574] librabbitmq-0:0.13.0-5.fc40.a 100% | 5.4 MiB/s | 44.0 KiB | 00m00s [372/574] librist-0:0.2.7-4.fc40.aarch6 100% | 10.7 MiB/s | 76.5 KiB | 00m00s [373/574] libsmbclient-2:4.20.0-0.5.rc4 100% | 11.5 MiB/s | 82.6 KiB | 00m00s [374/574] srt-libs-0:1.5.3-2.fc40.aarch 100% | 48.7 MiB/s | 349.2 KiB | 00m00s [375/574] vapoursynth-libs-0:65-2.fc40. 100% | 32.0 MiB/s | 328.1 KiB | 00m00s [376/574] libvdpau-0:1.5-6.fc40.aarch64 100% | 1.8 MiB/s | 16.5 KiB | 00m00s [377/574] libusb1-0:1.0.27-1.fc40.aarch 100% | 14.8 MiB/s | 75.7 KiB | 00m00s [378/574] jbigkit-libs-0:2.1-29.fc40.aa 100% | 8.6 MiB/s | 53.0 KiB | 00m00s [379/574] libraw1394-0:2.1.2-20.fc40.aa 100% | 5.8 MiB/s | 65.4 KiB | 00m00s [380/574] liblerc-0:4.0.0-6.fc40.aarch6 100% | 46.2 MiB/s | 189.4 KiB | 00m00s [381/574] imath-0:3.1.10-1.fc40.aarch64 100% | 18.3 MiB/s | 93.8 KiB | 00m00s [382/574] double-conversion-0:3.3.0-3.f 100% | 11.9 MiB/s | 48.6 KiB | 00m00s [383/574] dbus-libs-1:1.14.10-3.fc40.aa 100% | 7.3 MiB/s | 155.9 KiB | 00m00s [384/574] libproxy-0:0.5.3-5.fc40.aarch 100% | 2.0 MiB/s | 48.0 KiB | 00m00s [385/574] pcre2-utf16-0:10.42-2.fc40.2. 100% | 18.0 MiB/s | 202.6 KiB | 00m00s [386/574] qt-settings-0:40.0-1.fc40.noa 100% | 843.0 KiB/s | 10.1 KiB | 00m00s [387/574] qt5-qtbase-common-0:5.15.13-1 100% | 593.8 KiB/s | 11.9 KiB | 00m00s [388/574] glx-utils-0:9.0.0-6.fc40.aarc 100% | 3.3 MiB/s | 75.3 KiB | 00m00s [389/574] libICE-0:1.1.1-3.fc40.aarch64 100% | 4.0 MiB/s | 73.6 KiB | 00m00s [390/574] libSM-0:1.2.4-3.fc40.aarch64 100% | 1.7 MiB/s | 43.0 KiB | 00m00s [391/574] libinput-0:1.25.0-3.fc40.aarc 100% | 7.9 MiB/s | 210.0 KiB | 00m00s [392/574] libxkbcommon-0:1.6.0-2.fc40.a 100% | 4.0 MiB/s | 142.7 KiB | 00m00s [393/574] libxkbcommon-x11-0:1.6.0-2.fc 100% | 693.6 KiB/s | 20.8 KiB | 00m00s [394/574] xcb-util-image-0:0.4.1-5.fc40 100% | 1.1 MiB/s | 18.7 KiB | 00m00s [395/574] xcb-util-keysyms-0:0.4.1-5.fc 100% | 793.3 KiB/s | 14.3 KiB | 00m00s [396/574] xcb-util-renderutil-0:0.3.10- 100% | 865.4 KiB/s | 17.3 KiB | 00m00s [397/574] xcb-util-wm-0:0.4.2-5.fc40.aa 100% | 1.6 MiB/s | 30.8 KiB | 00m00s [398/574] tbb2020.3-0:2020.3-4.fc40.aar 100% | 4.3 MiB/s | 93.3 KiB | 00m00s [399/574] protobuf-0:3.19.6-8.fc40.aarc 100% | 27.6 MiB/s | 931.9 KiB | 00m00s [400/574] libicu-0:74.2-1.fc40.aarch64 100% | 38.5 MiB/s | 10.4 MiB | 00m00s [401/574] libaec-0:1.1.2-1.fc40.aarch64 100% | 3.0 MiB/s | 36.4 KiB | 00m00s [402/574] libXau-0:1.0.11-6.fc40.aarch6 100% | 2.1 MiB/s | 32.1 KiB | 00m00s [403/574] halide-0:17.0.1-20240220.0.fc 100% | 159.4 MiB/s | 19.8 MiB | 00m00s [404/574] abattis-cantarell-vf-fonts-0: 100% | 5.6 MiB/s | 120.3 KiB | 00m00s [405/574] google-noto-sans-vf-fonts-0:2 100% | 44.6 MiB/s | 593.3 KiB | 00m00s [406/574] nettle-0:3.9.1-6.fc40.aarch64 100% | 21.3 MiB/s | 435.3 KiB | 00m00s [407/574] avahi-libs-0:0.8-26.fc40.aarc 100% | 8.1 MiB/s | 66.6 KiB | 00m00s [408/574] libdatrie-0:0.2.13-9.fc40.aar 100% | 3.9 MiB/s | 32.1 KiB | 00m00s [409/574] gpgmepp-0:1.23.2-3.fc40.aarch 100% | 15.9 MiB/s | 130.4 KiB | 00m00s [410/574] nspr-0:4.35.0-21.fc40.aarch64 100% | 19.0 MiB/s | 136.4 KiB | 00m00s [411/574] nss-0:3.98.0-1.fc40.aarch64 100% | 56.9 MiB/s | 699.2 KiB | 00m00s [412/574] gc-0:8.2.2-6.fc40.aarch64 100% | 7.7 MiB/s | 109.7 KiB | 00m00s [413/574] poppler-data-0:0.4.11-7.fc40. 100% | 65.1 MiB/s | 2.0 MiB | 00m00s [414/574] libcbor-0:0.11.0-1.fc40.aarch 100% | 2.7 MiB/s | 32.7 KiB | 00m00s [415/574] libXau-devel-0:1.0.11-6.fc40. 100% | 3.3 MiB/s | 13.6 KiB | 00m00s [416/574] perl-Class-Struct-0:0.68-506. 100% | 2.7 MiB/s | 22.5 KiB | 00m00s [417/574] libwayland-server-0:1.22.0-3. 100% | 10.2 MiB/s | 41.8 KiB | 00m00s [418/574] libxshmfence-0:1.3.2-3.fc40.a 100% | 2.0 MiB/s | 12.4 KiB | 00m00s [419/574] mesa-libglapi-0:24.0.4-1.fc40 100% | 11.1 MiB/s | 68.4 KiB | 00m00s [420/574] libXfixes-0:6.0.1-3.fc40.aarc 100% | 4.8 MiB/s | 19.5 KiB | 00m00s [421/574] libXxf86vm-0:1.1.5-6.fc40.aar 100% | 4.3 MiB/s | 17.8 KiB | 00m00s [422/574] pcre2-utf32-0:10.42-2.fc40.2. 100% | 15.4 MiB/s | 189.7 KiB | 00m00s [423/574] MUMPS-common-0:5.6.2-3.fc40.n 100% | 57.5 MiB/s | 882.6 KiB | 00m00s [424/574] scotch-0:7.0.4-3.fc40.aarch64 100% | 27.1 MiB/s | 277.0 KiB | 00m00s [425/574] scotch-devel-0:7.0.4-3.fc40.a 100% | 3.5 MiB/s | 24.9 KiB | 00m00s [426/574] coin-or-Cgl-0:0.60.8-1.fc40.a 100% | 43.4 MiB/s | 400.0 KiB | 00m00s [427/574] vtk-0:9.2.6-12.fc40.aarch64 100% | 80.4 MiB/s | 22.8 MiB | 00m00s [428/574] libnauty-0:2.8.8-3.fc40.aarch 100% | 20.2 MiB/s | 703.7 KiB | 00m00s [429/574] highway-0:1.1.0-1.fc40.aarch6 100% | 10.5 MiB/s | 96.5 KiB | 00m00s [430/574] soxr-0:0.1.3-15.fc40.aarch64 100% | 8.8 MiB/s | 71.8 KiB | 00m00s [431/574] lpcnetfreedv-0:0.5-5.fc40.aar 100% | 155.9 MiB/s | 7.3 MiB | 00m00s [432/574] mesa-filesystem-0:24.0.4-1.fc 100% | 1.2 MiB/s | 19.7 KiB | 00m00s [433/574] libudfread-0:1.1.2-8.fc40.aar 100% | 2.2 MiB/s | 34.5 KiB | 00m00s [434/574] libgpg-error-0:1.48-1.fc40.aa 100% | 37.8 MiB/s | 232.3 KiB | 00m00s [435/574] cjson-0:1.7.15-4.fc40.aarch64 100% | 6.2 MiB/s | 31.6 KiB | 00m00s [436/574] mbedtls-0:2.28.7-1.fc40.aarch 100% | 65.4 MiB/s | 401.8 KiB | 00m00s [437/574] libtalloc-0:2.4.2-1.fc40.aarc 100% | 7.4 MiB/s | 30.2 KiB | 00m00s [438/574] mpg123-libs-0:1.31.3-4.fc40.a 100% | 30.9 MiB/s | 348.1 KiB | 00m00s [439/574] libtevent-0:0.16.1-1.fc40.aar 100% | 5.1 MiB/s | 47.5 KiB | 00m00s [440/574] libwbclient-2:4.20.0-0.5.rc4. 100% | 3.0 MiB/s | 49.9 KiB | 00m00s [441/574] samba-common-2:4.20.0-0.5.rc4 100% | 6.9 MiB/s | 154.3 KiB | 00m00s [442/574] samba-common-libs-2:4.20.0-0. 100% | 6.3 MiB/s | 115.3 KiB | 00m00s [443/574] zimg-0:3.0.5-2.fc40.aarch64 100% | 11.7 MiB/s | 143.3 KiB | 00m00s [444/574] duktape-0:2.7.0-7.fc40.aarch6 100% | 11.9 MiB/s | 171.2 KiB | 00m00s [445/574] libevdev-0:1.13.1-4.fc40.aarc 100% | 4.6 MiB/s | 42.6 KiB | 00m00s [446/574] mtdev-0:1.1.6-8.fc40.aarch64 100% | 3.4 MiB/s | 20.8 KiB | 00m00s [447/574] libwacom-0:2.10.0-1.fc40.aarc 100% | 3.8 MiB/s | 42.8 KiB | 00m00s [448/574] samba-client-libs-2:4.20.0-0. 100% | 82.9 MiB/s | 5.8 MiB | 00m00s [449/574] xcb-util-0:0.4.1-5.fc40.aarch 100% | 1.2 MiB/s | 18.9 KiB | 00m00s [450/574] cgnslib-libs-0:4.4.0-4.fc40.a 100% | 23.8 MiB/s | 293.0 KiB | 00m00s [451/574] libGLEW-0:2.2.0-7.fc40.aarch6 100% | 57.3 MiB/s | 176.2 KiB | 00m00s [452/574] libXcursor-0:1.2.1-7.fc40.aar 100% | 9.9 MiB/s | 30.3 KiB | 00m00s [453/574] libharu-0:2.4.3-5.fc40.aarch6 100% | 63.0 MiB/s | 581.0 KiB | 00m00s [454/574] mariadb-connector-c-0:3.3.8-3 100% | 40.8 MiB/s | 209.1 KiB | 00m00s [455/574] netcdf-0:4.9.2-5.fc40.aarch64 100% | 73.3 MiB/s | 825.4 KiB | 00m00s [456/574] openslide-0:4.0.0-3.fc40.aarc 100% | 14.0 MiB/s | 128.7 KiB | 00m00s [457/574] gdal-libs-0:3.8.4-2.fc40.aarc 100% | 90.4 MiB/s | 8.5 MiB | 00m00s [458/574] proj-0:9.3.1-3.fc40.aarch64 100% | 30.0 MiB/s | 1.4 MiB | 00m00s [459/574] xkeyboard-config-0:2.41-1.fc4 100% | 7.7 MiB/s | 975.9 KiB | 00m00s [460/574] pugixml-0:1.13-5.fc40.aarch64 100% | 5.6 MiB/s | 96.9 KiB | 00m00s [461/574] google-noto-fonts-common-0:20 100% | 2.8 MiB/s | 17.3 KiB | 00m00s [462/574] gpgme-0:1.23.2-3.fc40.aarch64 100% | 34.3 MiB/s | 210.9 KiB | 00m00s [463/574] crypto-policies-scripts-0:202 100% | 28.6 MiB/s | 117.3 KiB | 00m00s [464/574] libassuan-0:2.5.7-1.fc40.aarc 100% | 4.6 MiB/s | 66.6 KiB | 00m00s [465/574] nss-softokn-0:3.98.0-1.fc40.a 100% | 58.0 MiB/s | 415.9 KiB | 00m00s [466/574] nss-sysinit-0:3.98.0-1.fc40.a 100% | 3.0 MiB/s | 18.7 KiB | 00m00s [467/574] nss-util-0:3.98.0-1.fc40.aarc 100% | 10.6 MiB/s | 86.7 KiB | 00m00s [468/574] cliquer-libs-0:1.22-8.fc40.aa 100% | 3.7 MiB/s | 38.2 KiB | 00m00s [469/574] libldb-0:2.9.0-1.fc40.aarch64 100% | 11.3 MiB/s | 185.2 KiB | 00m00s [470/574] libtdb-0:1.4.10-1.fc40.aarch6 100% | 4.0 MiB/s | 52.7 KiB | 00m00s [471/574] libwacom-data-0:2.10.0-1.fc40 100% | 13.7 MiB/s | 196.4 KiB | 00m00s [472/574] armadillo-0:12.8.1-1.fc40.aar 100% | 2.2 MiB/s | 31.0 KiB | 00m00s [473/574] freexl-0:2.0.0-7.fc40.aarch64 100% | 3.0 MiB/s | 45.7 KiB | 00m00s [474/574] cfitsio-0:4.4.0-2.fc40.aarch6 100% | 25.1 MiB/s | 591.9 KiB | 00m00s [475/574] giflib-0:5.2.2-1.fc40.aarch64 100% | 4.7 MiB/s | 52.6 KiB | 00m00s [476/574] geos-0:3.12.1-3.fc40.aarch64 100% | 39.9 MiB/s | 1.0 MiB | 00m00s [477/574] json-c-0:0.17-3.fc40.aarch64 100% | 2.9 MiB/s | 45.3 KiB | 00m00s [478/574] libdeflate-0:1.20-1.fc40.aarc 100% | 10.3 MiB/s | 63.2 KiB | 00m00s [479/574] libgeotiff-0:1.7.1-12.fc40.aa 100% | 6.1 MiB/s | 106.5 KiB | 00m00s [480/574] libgta-0:1.2.1-12.fc40.aarch6 100% | 1.9 MiB/s | 35.1 KiB | 00m00s [481/574] libkml-0:1.3.0-47.fc40.aarch6 100% | 29.5 MiB/s | 332.4 KiB | 00m00s [482/574] libpq-0:16.1-4.fc40.aarch64 100% | 27.4 MiB/s | 252.6 KiB | 00m00s [483/574] libqhull_r-1:8.0.2-4.fc40.aar 100% | 17.3 MiB/s | 195.3 KiB | 00m00s [484/574] libarrow-0:15.0.2-3.fc40.aarc 100% | 49.9 MiB/s | 4.8 MiB | 00m00s [485/574] ogdi-0:4.1.1-1.fc40.aarch64 100% | 12.8 MiB/s | 235.9 KiB | 00m00s [486/574] unixODBC-0:2.3.12-4.fc40.aarc 100% | 31.0 MiB/s | 476.0 KiB | 00m00s [487/574] libspatialite-0:5.1.0-5.fc40. 100% | 42.6 MiB/s | 2.9 MiB | 00m00s [488/574] xerces-c-0:3.2.5-2.fc40.aarch 100% | 33.5 MiB/s | 891.8 KiB | 00m00s [489/574] mariadb-connector-c-config-0: 100% | 793.7 KiB/s | 8.7 KiB | 00m00s [490/574] blosc-0:1.21.5-4.fc40.aarch64 100% | 5.9 MiB/s | 48.4 KiB | 00m00s [491/574] gdk-pixbuf2-modules-0:2.42.10 100% | 10.6 MiB/s | 87.0 KiB | 00m00s [492/574] libdicom-0:1.0.5-3.fc40.aarch 100% | 5.4 MiB/s | 88.6 KiB | 00m00s [493/574] proj-data-0:9.3.1-3.fc40.noar 100% | 44.4 MiB/s | 1.3 MiB | 00m00s [494/574] nss-softokn-freebl-0:3.98.0-1 100% | 16.9 MiB/s | 345.4 KiB | 00m00s [495/574] gnupg2-0:2.4.4-1.fc40.aarch64 100% | 51.4 MiB/s | 2.7 MiB | 00m00s [496/574] SuperLU-0:6.0.1-3.fc40.aarch6 100% | 10.0 MiB/s | 173.5 KiB | 00m00s [497/574] minizip-ng-compat-0:3.0.10-7. 100% | 8.0 MiB/s | 65.2 KiB | 00m00s [498/574] arpack-0:3.9.1-3.fc40.aarch64 100% | 10.4 MiB/s | 181.8 KiB | 00m00s [499/574] libarrow-doc-0:15.0.2-3.fc40. 100% | 1.4 MiB/s | 28.5 KiB | 00m00s [500/574] liborc1-0:1.9.3-1.fc40.aarch6 100% | 16.1 MiB/s | 462.4 KiB | 00m00s [501/574] re2-1:20220601-5.fc40.aarch64 100% | 5.0 MiB/s | 194.5 KiB | 00m00s [502/574] utf8proc-0:2.7.0-7.fc40.aarch 100% | 2.2 MiB/s | 80.2 KiB | 00m00s [503/574] uriparser-0:0.9.7-5.fc40.aarc 100% | 2.2 MiB/s | 56.9 KiB | 00m00s [504/574] librttopo-0:1.1.0-14.fc40.aar 100% | 11.1 MiB/s | 205.0 KiB | 00m00s [505/574] llvm17-libs-0:17.0.6-7.fc40.a 100% | 52.7 MiB/s | 25.6 MiB | 00m00s [506/574] libksba-0:1.6.6-1.fc40.aarch6 100% | 3.5 MiB/s | 158.0 KiB | 00m00s [507/574] tpm2-tss-0:4.0.1-7.fc40.aarch 100% | 29.1 MiB/s | 387.6 KiB | 00m00s [508/574] flexiblas-openblas-openmp64-0 100% | 2.4 MiB/s | 16.9 KiB | 00m00s [509/574] npth-0:1.7-1.fc40.aarch64 100% | 410.9 KiB/s | 25.1 KiB | 00m00s [510/574] flexiblas-netlib64-0:3.4.2-1. 100% | 94.1 MiB/s | 2.6 MiB | 00m00s [511/574] isl-0:0.16.1-20.fc40.aarch64 100% | 63.2 MiB/s | 841.1 KiB | 00m00s [512/574] glibc-devel-0:2.39.9999-99.fc 100% | 129.5 MiB/s | 530.4 KiB | 00m00s [513/574] kernel-headers-0:6.8.3-300.fc 100% | 62.4 MiB/s | 1.6 MiB | 00m00s [514/574] cuda-gcc-12-c++-0:12.3.1-1.fc 100% | 169.6 MiB/s | 13.6 MiB | 00m00s [515/574] libxcrypt-devel-0:4.4.36-5.fc 100% | 1.0 MiB/s | 28.6 KiB | 00m00s [516/574] gcc-plugin-annobin-0:14.0.1-0 100% | 22.0 MiB/s | 45.1 KiB | 00m00s [517/574] annobin-docs-0:12.42-1.fc40.n 100% | 17.4 MiB/s | 89.1 KiB | 00m00s [518/574] annobin-plugin-gcc-0:12.42-1. 100% | 93.4 MiB/s | 956.5 KiB | 00m00s [519/574] pyproject-rpm-macros-0:1.12.0 100% | 6.7 MiB/s | 41.4 KiB | 00m00s [520/574] python-rpm-macros-0:3.12-7.fc 100% | 3.5 MiB/s | 18.0 KiB | 00m00s [521/574] python3-rpm-macros-0:3.12-7.f 100% | 2.1 MiB/s | 12.8 KiB | 00m00s [522/574] python3-rpm-generators-0:14-1 100% | 3.6 MiB/s | 29.6 KiB | 00m00s [523/574] cmake-rpm-macros-0:3.28.2-1.f 100% | 1.8 MiB/s | 17.0 KiB | 00m00s [524/574] python3-packaging-0:23.2-4.fc 100% | 11.1 MiB/s | 125.2 KiB | 00m00s [525/574] cuda-gcc-12-0:12.3.1-1.fc39.a 100% | 144.0 MiB/s | 28.7 MiB | 00m00s [526/574] libcufft-12-3-0:11.0.12.1-2.a 100% | 91.0 MiB/s | 60.4 MiB | 00m01s [527/574] libnpp-12-3-0:12.2.3.2-2.aarc 100% | 104.5 MiB/s | 95.9 MiB | 00m01s [528/574] libcublas-12-3-0:12.3.4.1-2.a 100% | 108.6 MiB/s | 245.4 MiB | 00m02s [529/574] cuda-toolkit-12-3-config-comm 100% | 3.8 MiB/s | 7.7 KiB | 00m00s [530/574] cuda-toolkit-12-config-common 100% | 3.8 MiB/s | 7.9 KiB | 00m00s [531/574] cuda-toolkit-config-common-0: 100% | 3.8 MiB/s | 7.9 KiB | 00m00s [532/574] cuda-cccl-12-3-0:12.3.101-1.a 100% | 237.0 MiB/s | 1.9 MiB | 00m00s [533/574] libcusparse-12-3-0:12.2.0.103 100% | 78.9 MiB/s | 108.2 MiB | 00m01s [534/574] systemd-0:255.4-1.fc40.aarch6 100% | 133.6 MiB/s | 4.8 MiB | 00m00s [535/574] systemd-rpm-macros-0:255.4-1. 100% | 3.8 MiB/s | 30.7 KiB | 00m00s [536/574] dbus-1:1.14.10-3.fc40.aarch64 100% | 3.9 MiB/s | 8.0 KiB | 00m00s [537/574] kmod-libs-0:31-5.fc40.aarch64 100% | 33.1 MiB/s | 67.7 KiB | 00m00s [538/574] libseccomp-0:2.5.3-8.fc40.aar 100% | 5.0 MiB/s | 71.6 KiB | 00m00s [539/574] systemd-pam-0:255.4-1.fc40.aa 100% | 25.1 MiB/s | 385.4 KiB | 00m00s [540/574] dbus-broker-0:35-4.fc40.aarch 100% | 3.0 MiB/s | 168.0 KiB | 00m00s [541/574] dbus-common-1:1.14.10-3.fc40. 100% | 509.7 KiB/s | 14.8 KiB | 00m00s [542/574] metis-0:5.2.1-20230403.0.gite 100% | 9.4 MiB/s | 174.0 KiB | 00m00s [543/574] gklib-0:5.1.1-20230326.0.git8 100% | 2.1 MiB/s | 92.9 KiB | 00m00s [544/574] pcre-0:8.45-1.fc40.6.aarch64 100% | 10.9 MiB/s | 189.0 KiB | 00m00s [545/574] perl-Pod-Usage-4:2.03-503.fc4 100% | 697.1 KiB/s | 39.7 KiB | 00m00s [546/574] perl-Pod-Perldoc-0:3.28.01-50 100% | 3.0 MiB/s | 85.6 KiB | 00m00s [547/574] perl-podlators-1:5.01-502.fc4 100% | 2.0 MiB/s | 125.5 KiB | 00m00s [548/574] libcusolver-12-3-0:11.5.4.101 100% | 128.3 MiB/s | 76.3 MiB | 00m01s [549/574] groff-base-0:1.23.0-6.fc40.aa 100% | 14.0 MiB/s | 1.1 MiB | 00m00s [550/574] perl-File-Temp-1:0.231.100-50 100% | 9.6 MiB/s | 59.0 KiB | 00m00s [551/574] perl-HTTP-Tiny-0:0.088-5.fc40 100% | 9.0 MiB/s | 55.6 KiB | 00m00s [552/574] perl-Pod-Simple-1:3.45-6.fc40 100% | 12.6 MiB/s | 218.5 KiB | 00m00s [553/574] perl-Term-ANSIColor-0:5.01-50 100% | 3.1 MiB/s | 47.6 KiB | 00m00s [554/574] perl-Term-Cap-0:1.18-503.fc40 100% | 21.4 MiB/s | 21.9 KiB | 00m00s [555/574] perl-File-Path-0:2.18-503.fc4 100% | 11.4 MiB/s | 35.0 KiB | 00m00s [556/574] perl-IO-Socket-SSL-0:2.085-1. 100% | 31.9 MiB/s | 228.6 KiB | 00m00s [557/574] perl-Mozilla-CA-0:20231213-3. 100% | 2.3 MiB/s | 13.9 KiB | 00m00s [558/574] perl-Time-Local-2:1.350-5.fc4 100% | 1.2 MiB/s | 34.3 KiB | 00m00s [559/574] perl-Net-SSLeay-0:1.94-3.fc40 100% | 11.8 MiB/s | 375.0 KiB | 00m00s [560/574] perl-Text-Tabs+Wrap-0:2024.00 100% | 7.0 MiB/s | 21.6 KiB | 00m00s [561/574] perl-Pod-Escapes-1:1.07-503.f 100% | 3.2 MiB/s | 19.6 KiB | 00m00s [562/574] perl-if-0:0.61.000-506.fc40.n 100% | 3.5 MiB/s | 14.4 KiB | 00m00s [563/574] perl-IO-Socket-IP-0:0.42-2.fc 100% | 20.4 MiB/s | 41.7 KiB | 00m00s [564/574] ncurses-0:6.4-12.20240127.fc4 100% | 41.0 MiB/s | 420.2 KiB | 00m00s [565/574] perl-URI-0:5.27-1.fc40.noarch 100% | 6.8 MiB/s | 132.5 KiB | 00m00s [566/574] perl-AutoLoader-0:5.74-506.fc 100% | 1.4 MiB/s | 21.7 KiB | 00m00s [567/574] perl-libnet-0:3.15-503.fc40.n 100% | 15.7 MiB/s | 128.5 KiB | 00m00s [568/574] perl-Data-Dumper-0:2.188-503. 100% | 5.4 MiB/s | 54.9 KiB | 00m00s [569/574] perl-B-0:1.88-506.fc40.aarch6 100% | 43.6 MiB/s | 178.5 KiB | 00m00s [570/574] perl-FileHandle-0:2.05-506.fc 100% | 5.2 MiB/s | 15.9 KiB | 00m00s [571/574] perl-Digest-0:1.20-502.fc40.n 100% | 8.0 MiB/s | 24.6 KiB | 00m00s [572/574] perl-Digest-MD5-0:2.59-3.fc40 100% | 2.5 MiB/s | 35.8 KiB | 00m00s [573/574] hdf-libs-0:4.2.16.2-1.fc40.aa 100% | 27.0 MiB/s | 276.0 KiB | 00m00s [574/574] libcudnn8-0:8.9.7.29-2.cuda12 100% | 116.5 MiB/s | 445.9 MiB | 00m04s -------------------------------------------------------------------------------- [574/574] Total 100% | 210.8 MiB/s | 2.3 GiB | 00m11s Running transaction [ 1/576] Verify package files 100% | 57.0 B/s | 574.0 B | 00m10s [ 2/576] Prepare transaction 100% | 1.6 KiB/s | 574.0 B | 00m00s [ 3/576] Installing cmake-filesystem-0 100% | 3.5 MiB/s | 7.1 KiB | 00m00s [ 4/576] Installing libpng-2:1.6.40-3. 100% | 163.5 MiB/s | 334.9 KiB | 00m00s [ 5/576] Installing libgfortran-0:14.0 100% | 304.4 MiB/s | 1.5 MiB | 00m00s [ 6/576] Installing expat-0:2.6.0-1.fc 100% | 261.2 MiB/s | 534.9 KiB | 00m00s [ 7/576] Installing libjpeg-turbo-0:3. 100% | 258.5 MiB/s | 794.1 KiB | 00m00s [ 8/576] Installing openblas-0:0.3.26- 100% | 95.5 MiB/s | 97.8 KiB | 00m00s [ 9/576] Installing cuda-toolkit-confi 100% | 0.0 B/s | 312.0 B | 00m00s [ 10/576] Installing cuda-toolkit-12-co 100% | 0.0 B/s | 316.0 B | 00m00s [ 11/576] Installing cuda-toolkit-12-3- 100% | 0.0 B/s | 124.0 B | 00m00s [ 12/576] Installing libdrm-0:2.4.120-3 100% | 340.7 MiB/s | 1.4 MiB | 00m00s [ 13/576] Installing libwebp-0:1.3.2-5. 100% | 309.3 MiB/s | 1.2 MiB | 00m00s [ 14/576] Installing snappy-0:1.1.10-4. 100% | 207.6 MiB/s | 212.6 KiB | 00m00s [ 15/576] Installing nspr-0:4.35.0-21.f 100% | 241.7 MiB/s | 742.4 KiB | 00m00s [ 16/576] Installing libX11-xcb-0:1.8.7 100% | 191.2 MiB/s | 195.8 KiB | 00m00s [ 17/576] Installing openjpeg2-0:2.5.2- 100% | 263.5 MiB/s | 539.6 KiB | 00m00s [ 18/576] Installing libtalloc-0:2.4.2- 100% | 192.6 MiB/s | 197.2 KiB | 00m00s [ 19/576] Installing libgpg-error-0:1.4 100% | 224.2 MiB/s | 1.1 MiB | 00m00s [ 20/576] Installing libogg-2:1.3.5-8.f 100% | 202.1 MiB/s | 206.9 KiB | 00m00s [ 21/576] Installing fonts-filesystem-1 100% | 0.0 B/s | 788.0 B | 00m00s [ 22/576] Installing urw-base35-fonts-c 100% | 37.5 MiB/s | 38.4 KiB | 00m00s [ 23/576] Installing libglvnd-1:1.7.0-4 100% | 568.4 MiB/s | 1.7 MiB | 00m00s [ 24/576] Installing libglvnd-opengl-1: 100% | 254.8 MiB/s | 521.8 KiB | 00m00s [ 25/576] Installing nss-util-0:3.98.0- 100% | 339.0 MiB/s | 347.2 KiB | 00m00s [ 26/576] Installing cuda-cudart-12-3-0 100% | 45.0 MiB/s | 736.7 KiB | 00m00s >>> Running post-install scriptlet: cuda-cudart-12-3-0:12.3.101-1.aarch64 >>> Stop post-install scriptlet: cuda-cudart-12-3-0:12.3.101-1.aarch64 [ 27/576] Installing libcublas-12-3-0:1 100% | 356.5 MiB/s | 584.0 MiB | 00m02s >>> Running post-install scriptlet: libcublas-12-3-0:12.3.4.1-2.aarch64 >>> Stop post-install scriptlet: libcublas-12-3-0:12.3.4.1-2.aarch64 [ 28/576] Installing libwayland-client- 100% | 194.6 MiB/s | 199.2 KiB | 00m00s [ 29/576] Installing protobuf-compat-0: 100% | 301.8 MiB/s | 3.6 MiB | 00m00s [ 30/576] Installing libuv-1:1.48.0-1.f 100% | 212.7 MiB/s | 653.3 KiB | 00m00s [ 31/576] Installing gflags-0:2.2.2-14. 100% | 181.9 MiB/s | 558.8 KiB | 00m00s [ 32/576] Installing libmpc-0:1.3.1-5.f 100% | 275.6 MiB/s | 282.2 KiB | 00m00s [ 33/576] Installing libcublas-devel-12 100% | 356.1 MiB/s | 729.2 KiB | 00m00s [ 34/576] Installing libtheora-1:1.1.1- 100% | 278.4 MiB/s | 855.3 KiB | 00m00s [ 35/576] Installing libvorbis-1:1.3.7- 100% | 316.5 MiB/s | 1.3 MiB | 00m00s [ 36/576] Installing libgcrypt-0:1.10.3 100% | 265.0 MiB/s | 1.1 MiB | 00m00s [ 37/576] Installing libassuan-0:2.5.7- 100% | 137.5 MiB/s | 281.6 KiB | 00m00s [ 38/576] Installing libtevent-0:0.16.1 100% | 194.1 MiB/s | 198.7 KiB | 00m00s [ 39/576] Installing openblas-openmp-0: 100% | 423.0 MiB/s | 19.5 MiB | 00m00s [ 40/576] Installing libcudnn8-0:8.9.7. 100% | 348.1 MiB/s | 1.0 GiB | 00m03s [ 41/576] Installing python-rpm-macros- 100% | 22.3 MiB/s | 22.8 KiB | 00m00s [ 42/576] Installing geos-0:3.12.1-3.fc 100% | 341.4 MiB/s | 3.8 MiB | 00m00s [ 43/576] Installing libtdb-0:1.4.10-1. 100% | 193.2 MiB/s | 197.9 KiB | 00m00s [ 44/576] Installing libaec-0:1.1.2-1.f 100% | 22.4 MiB/s | 412.4 KiB | 00m00s [ 45/576] Installing hdf5-0:1.12.1-15.f 100% | 254.0 MiB/s | 12.4 MiB | 00m00s [ 46/576] Installing libICE-0:1.1.1-3.f 100% | 134.0 MiB/s | 274.4 KiB | 00m00s [ 47/576] Installing lcms2-0:2.16-3.fc4 100% | 158.3 MiB/s | 486.4 KiB | 00m00s [ 48/576] Installing libunwind-0:1.8.0- 100% | 297.1 MiB/s | 608.4 KiB | 00m00s [ 49/576] Installing pthreadpool-1:0.1- 100% | 195.8 MiB/s | 200.5 KiB | 00m00s [ 50/576] Installing cuda-nvrtc-12-3-0: 100% | 228.1 MiB/s | 60.4 MiB | 00m00s >>> Running post-install scriptlet: cuda-nvrtc-12-3-0:12.3.107-1.aarch64 >>> Stop post-install scriptlet: cuda-nvrtc-12-3-0:12.3.107-1.aarch64 [ 51/576] Installing cpuinfo-1:0-202403 100% | 259.3 MiB/s | 796.5 KiB | 00m00s [ 52/576] Installing lmdb-libs-0:0.9.32 100% | 102.8 MiB/s | 210.6 KiB | 00m00s [ 53/576] Installing libSM-0:1.2.4-3.fc 100% | 248.7 MiB/s | 254.6 KiB | 00m00s [ 54/576] Installing python3-rpm-macros 100% | 0.0 B/s | 6.7 KiB | 00m00s [ 55/576] Installing onnx-libs-0:1.17.0 100% | 337.8 MiB/s | 3.0 MiB | 00m00s [ 56/576] Installing libcurand-12-3-0:1 100% | 279.5 MiB/s | 91.7 MiB | 00m00s >>> Running post-install scriptlet: libcurand-12-3-0:10.3.4.107-1.aarch64 >>> Stop post-install scriptlet: libcurand-12-3-0:10.3.4.107-1.aarch64 [ 57/576] Installing libcufft-12-3-0:11 100% | 151.6 MiB/s | 169.7 MiB | 00m01s >>> Running post-install scriptlet: libcufft-12-3-0:11.0.12.1-2.aarch64 >>> Stop post-install scriptlet: libcufft-12-3-0:11.0.12.1-2.aarch64 [ 58/576] Installing libcusparse-12-3-0 100% | 146.4 MiB/s | 251.8 MiB | 00m02s >>> Running post-install scriptlet: libcusparse-12-3-0:12.2.0.103-2.aarch64 >>> Stop post-install scriptlet: libcusparse-12-3-0:12.2.0.103-2.aarch64 [ 59/576] Installing openblas-openmp64- 100% | 394.6 MiB/s | 19.3 MiB | 00m00s [ 60/576] Installing flexiblas-netlib-0 100% | 59.6 MiB/s | 9.7 MiB | 00m00s [ 61/576] Installing flexiblas-netlib64 100% | 326.3 MiB/s | 9.5 MiB | 00m00s [ 62/576] Installing flexiblas-openblas 100% | 191.6 MiB/s | 196.2 KiB | 00m00s [ 63/576] Installing flexiblas-0:3.4.2- 100% | 47.0 MiB/s | 48.1 KiB | 00m00s [ 64/576] Installing flexiblas-openblas 100% | 191.6 MiB/s | 196.2 KiB | 00m00s [ 65/576] Installing suitesparse-0:7.6. 100% | 132.5 MiB/s | 116.0 MiB | 00m01s [ 66/576] Installing hdf-libs-0:4.2.16. 100% | 208.0 MiB/s | 852.1 KiB | 00m00s [ 67/576] Installing minizip-ng-compat- 100% | 257.2 MiB/s | 263.4 KiB | 00m00s [ 68/576] Installing freexl-0:2.0.0-7.f 100% | 108.7 MiB/s | 222.5 KiB | 00m00s [ 69/576] Installing json-c-0:0.17-3.fc 100% | 198.8 MiB/s | 203.6 KiB | 00m00s [ 70/576] Installing libevdev-0:1.13.1- 100% | 194.5 MiB/s | 199.2 KiB | 00m00s [ 71/576] Installing scotch-0:7.0.4-3.f 100% | 310.4 MiB/s | 1.2 MiB | 00m00s [ 72/576] Installing mesa-libglapi-0:24 100% | 225.4 MiB/s | 461.7 KiB | 00m00s [ 73/576] Installing libxshmfence-0:1.3 100% | 191.6 MiB/s | 196.2 KiB | 00m00s [ 74/576] Installing libwayland-server- 100% | 195.0 MiB/s | 199.7 KiB | 00m00s [ 75/576] Installing nettle-0:3.9.1-6.f 100% | 233.6 MiB/s | 956.7 KiB | 00m00s [ 76/576] Installing gnutls-0:3.8.3-2.f 100% | 48.9 MiB/s | 3.4 MiB | 00m00s [ 77/576] Installing glib2-0:2.80.0-1.f 100% | 328.1 MiB/s | 16.4 MiB | 00m00s [ 78/576] Installing shared-mime-info-0 100% | 157.0 MiB/s | 2.7 MiB | 00m00s >>> Running post-install scriptlet: shared-mime-info-0:2.3-4.fc40.aarch64 >>> Stop post-install scriptlet: shared-mime-info-0:2.3-4.fc40.aarch64 [ 79/576] Installing gdk-pixbuf2-0:2.42 100% | 181.7 MiB/s | 2.9 MiB | 00m00s [ 80/576] Installing libgudev-0:238-5.f 100% | 114.0 MiB/s | 233.4 KiB | 00m00s [ 81/576] Installing libXau-0:1.0.11-6. 100% | 119.3 MiB/s | 244.3 KiB | 00m00s [ 82/576] Installing libxcb-0:1.16-4.fc 100% | 420.1 MiB/s | 5.0 MiB | 00m00s [ 83/576] Installing mesa-libgbm-0:24.0 100% | 193.5 MiB/s | 198.2 KiB | 00m00s [ 84/576] Installing libglvnd-egl-1:1.7 100% | 193.8 MiB/s | 198.5 KiB | 00m00s [ 85/576] Installing mesa-libEGL-0:24.0 100% | 193.7 MiB/s | 396.6 KiB | 00m00s [ 86/576] Installing protobuf-0:3.19.6- 100% | 324.5 MiB/s | 3.2 MiB | 00m00s [ 87/576] Installing pcre2-utf16-0:10.4 100% | 315.9 MiB/s | 646.9 KiB | 00m00s [ 88/576] Installing libicu-0:74.2-1.fc 100% | 348.6 MiB/s | 35.9 MiB | 00m00s [ 89/576] Installing double-conversion- 100% | 100.7 MiB/s | 206.2 KiB | 00m00s [ 90/576] Installing dbus-libs-1:1.14.1 100% | 239.3 MiB/s | 490.2 KiB | 00m00s [ 91/576] Installing avahi-libs-0:0.8-2 100% | 301.2 MiB/s | 616.8 KiB | 00m00s [ 92/576] Installing cups-libs-1:2.4.7- 100% | 300.9 MiB/s | 924.4 KiB | 00m00s [ 93/576] Installing imath-0:3.1.10-1.f 100% | 166.3 MiB/s | 511.0 KiB | 00m00s [ 94/576] Installing openexr-libs-0:3.1 100% | 429.0 MiB/s | 6.9 MiB | 00m00s [ 95/576] Installing liblerc-0:4.0.0-6. 100% | 199.2 MiB/s | 611.9 KiB | 00m00s [ 96/576] Installing svt-av1-libs-0:1.4 100% | 314.7 MiB/s | 3.5 MiB | 00m00s [ 97/576] Installing rav1e-libs-0:0.7.1 100% | 303.1 MiB/s | 2.1 MiB | 00m00s [ 98/576] Installing libdav1d-0:1.4.0-1 100% | 299.9 MiB/s | 921.4 KiB | 00m00s [ 99/576] Installing libaom-0:3.8.2-1.f 100% | 307.4 MiB/s | 3.7 MiB | 00m00s [100/576] Installing opus-0:1.5.1-1.fc4 100% | 254.3 MiB/s | 520.9 KiB | 00m00s [101/576] Installing asl-0:20240106-1.2 100% | 356.4 MiB/s | 2.5 MiB | 00m00s [102/576] Installing xorg-x11-proto-dev 100% | 175.6 MiB/s | 1.8 MiB | 00m00s [103/576] Installing libedit-0:3.1-50.2 100% | 168.7 MiB/s | 345.5 KiB | 00m00s [104/576] Installing adobe-mappings-cma 100% | 288.3 MiB/s | 14.4 MiB | 00m00s >>> Running pre-install scriptlet: xml-common-0:0.6.3-63.fc40.noarch >>> Stop pre-install scriptlet: xml-common-0:0.6.3-63.fc40.noarch [105/576] Installing xml-common-0:0.6.3 100% | 39.6 MiB/s | 81.1 KiB | 00m00s [106/576] Installing openpgm-0:5.2.122- 100% | 203.9 MiB/s | 417.5 KiB | 00m00s [107/576] Installing libsodium-0:1.0.19 100% | 192.0 MiB/s | 393.2 KiB | 00m00s [108/576] Installing zeromq-0:4.3.5-16. 100% | 203.8 MiB/s | 1.2 MiB | 00m00s [109/576] Installing libnl3-0:3.9.0-3.f 100% | 287.8 MiB/s | 1.7 MiB | 00m00s [110/576] Installing libibverbs-0:48.0- 100% | 355.2 MiB/s | 3.9 MiB | 00m00s [111/576] Installing jsoncpp-0:1.9.5-7. 100% | 164.6 MiB/s | 337.2 KiB | 00m00s [112/576] Installing fftw-libs-single-0 100% | 294.8 MiB/s | 2.4 MiB | 00m00s [113/576] Installing fftw-libs-long-0:3 100% | 303.6 MiB/s | 2.7 MiB | 00m00s [114/576] Installing fftw-libs-double-0 100% | 328.0 MiB/s | 2.3 MiB | 00m00s [115/576] Installing libnccl-0:2.21.5-1 100% | 31.8 MiB/s | 230.3 MiB | 00m07s >>> Running post-install scriptlet: libnccl-0:2.21.5-1+cuda12.4.aarch64 >>> Stop post-install scriptlet: libnccl-0:2.21.5-1+cuda12.4.aarch64 [116/576] Installing tbb-0:2021.11.0-5. 100% | 212.7 MiB/s | 871.0 KiB | 00m00s [117/576] Installing libibumad-0:48.0-4 100% | 192.2 MiB/s | 196.8 KiB | 00m00s [118/576] Installing ocl-icd-0:2.3.2-5. 100% | 139.0 MiB/s | 284.6 KiB | 00m00s [119/576] Installing hiredis-0:1.0.2-7. 100% | 97.3 MiB/s | 199.3 KiB | 00m00s [120/576] Installing flatbuffers-0:23.5 100% | 58.5 MiB/s | 599.3 KiB | 00m00s [121/576] Installing gloo-1:0.5.0-20240 100% | 342.1 MiB/s | 3.8 MiB | 00m00s [122/576] Installing fftw-0:3.3.10-11.f 100% | 296.8 MiB/s | 607.8 KiB | 00m00s [123/576] Installing fftw-libs-0:3.3.10 100% | 0.0 B/s | 124.0 B | 00m00s [124/576] Installing librdmacm-0:48.0-4 100% | 226.5 MiB/s | 463.9 KiB | 00m00s [125/576] Installing libsodium-devel-0: 100% | 239.4 MiB/s | 3.8 MiB | 00m00s [126/576] Installing openpgm-devel-0:5. 100% | 67.9 MiB/s | 347.7 KiB | 00m00s [127/576] Installing iso-codes-0:4.16.0 100% | 223.8 MiB/s | 19.0 MiB | 00m00s [128/576] Installing adobe-mappings-cma 100% | 190.5 MiB/s | 585.2 KiB | 00m00s [129/576] Installing llvm17-libs-0:17.0 100% | 362.7 MiB/s | 110.6 MiB | 00m00s >>> Running post-install scriptlet: llvm17-libs-0:17.0.6-7.fc40.aarch64 >>> Stop post-install scriptlet: llvm17-libs-0:17.0.6-7.fc40.aarch64 [130/576] Installing halide-0:17.0.1-20 100% | 373.8 MiB/s | 134.2 MiB | 00m00s [131/576] Installing libXau-devel-0:1.0 100% | 1.1 MiB/s | 8.2 KiB | 00m00s [132/576] Installing libxcb-devel-0:1.1 100% | 50.1 MiB/s | 3.1 MiB | 00m00s [133/576] Installing libavif-0:1.0.4-1. 100% | 137.2 MiB/s | 281.0 KiB | 00m00s [134/576] Installing liborc1-0:1.9.3-1. 100% | 278.2 MiB/s | 1.7 MiB | 00m00s [135/576] Installing libglvnd-gles-1:1. 100% | 318.1 MiB/s | 651.5 KiB | 00m00s [136/576] Installing xcb-util-keysyms-0 100% | 193.2 MiB/s | 197.8 KiB | 00m00s [137/576] Installing xcb-util-renderuti 100% | 197.2 MiB/s | 201.9 KiB | 00m00s [138/576] Installing xcb-util-wm-0:0.4. 100% | 386.1 MiB/s | 395.4 KiB | 00m00s [139/576] Installing xcb-util-0:0.4.1-5 100% | 195.1 MiB/s | 199.7 KiB | 00m00s [140/576] Installing xcb-util-image-0:0 100% | 194.9 MiB/s | 199.6 KiB | 00m00s [141/576] Installing graphene-0:1.10.6- 100% | 238.5 MiB/s | 244.3 KiB | 00m00s [142/576] Installing srt-libs-0:1.5.3-2 100% | 225.6 MiB/s | 924.1 KiB | 00m00s [143/576] Installing scotch-devel-0:7.0 100% | 18.7 MiB/s | 459.8 KiB | 00m00s >>> Running pre-install scriptlet: tpm2-tss-0:4.0.1-7.fc40.aarch64 >>> Stop pre-install scriptlet: tpm2-tss-0:4.0.1-7.fc40.aarch64 [144/576] Installing tpm2-tss-0:4.0.1-7 100% | 290.8 MiB/s | 3.2 MiB | 00m00s [145/576] Installing glpk-0:5.0-11.fc40 100% | 214.9 MiB/s | 880.1 KiB | 00m00s [146/576] Installing coin-or-CoinUtils- 100% | 242.2 MiB/s | 1.2 MiB | 00m00s [147/576] Installing coin-or-Osi-0:0.10 100% | 311.9 MiB/s | 5.6 MiB | 00m00s [148/576] Installing arpack-0:3.9.1-3.f 100% | 198.3 MiB/s | 812.0 KiB | 00m00s [149/576] Installing libcusparse-devel- 100% | 111.9 MiB/s | 252.2 MiB | 00m02s [150/576] Installing magma-0:2.8.0-2024 100% | 302.6 MiB/s | 230.9 MiB | 00m01s [151/576] Installing libcufft-devel-12- 100% | 32.3 MiB/s | 132.3 KiB | 00m00s [152/576] Installing pyproject-rpm-macr 100% | 98.4 MiB/s | 100.8 KiB | 00m00s [153/576] Installing lmdb-0:0.9.32-1.fc 100% | 385.2 MiB/s | 788.9 KiB | 00m00s [154/576] Installing libldb-0:2.9.0-1.f 100% | 428.7 MiB/s | 3.0 MiB | 00m00s [155/576] Installing nnpack-0:0-2023020 100% | 266.8 MiB/s | 273.2 KiB | 00m00s [156/576] Installing qnnpack-0:0-201908 100% | 202.6 MiB/s | 207.4 KiB | 00m00s [157/576] Installing libunwind-devel-0: 100% | 75.1 MiB/s | 153.9 KiB | 00m00s [158/576] Installing cgnslib-libs-0:4.4 100% | 299.1 MiB/s | 918.9 KiB | 00m00s [159/576] Installing librttopo-0:1.1.0- 100% | 177.9 MiB/s | 546.6 KiB | 00m00s [160/576] Installing cpp-0:14.0.1-0.13. 100% | 317.9 MiB/s | 31.8 MiB | 00m00s [161/576] Installing cuda-gcc-12-0:12.3 100% | 352.8 MiB/s | 100.5 MiB | 00m00s [162/576] Installing gflags-devel-0:2.2 100% | 31.6 MiB/s | 64.6 KiB | 00m00s [163/576] Installing glog-0:0.3.5-20.fc 100% | 43.8 MiB/s | 268.8 KiB | 00m00s [164/576] Installing ceres-solver-0:2.2 100% | 431.5 MiB/s | 5.2 MiB | 00m00s [165/576] Installing libuv-static-1:1.4 100% | 205.1 MiB/s | 420.1 KiB | 00m00s [166/576] Installing libuv-devel-1:1.48 100% | 68.0 MiB/s | 209.0 KiB | 00m00s [167/576] Installing tensorpipe-0:0-202 100% | 306.2 MiB/s | 2.8 MiB | 00m00s [168/576] Installing protobuf-compat-co 100% | 103.0 MiB/s | 3.1 MiB | 00m00s [169/576] Installing libwayland-cursor- 100% | 205.2 MiB/s | 210.1 KiB | 00m00s [170/576] Installing nss-softokn-freebl 100% | 243.6 MiB/s | 997.8 KiB | 00m00s [171/576] Installing nss-softokn-0:3.98 100% | 371.1 MiB/s | 2.6 MiB | 00m00s [172/576] Installing mesa-libGLU-0:9.0. 100% | 192.6 MiB/s | 394.4 KiB | 00m00s [173/576] Installing urw-base35-bookman 100% | 91.0 MiB/s | 1.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-bookman-fonts-0:20200910-19.fc40. >>> Stop post-install scriptlet: urw-base35-bookman-fonts-0:20200910-19.fc40.noa [174/576] Installing urw-base35-c059-fo 100% | 126.8 MiB/s | 1.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40.noa >>> Stop post-install scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40.noarch [175/576] Installing urw-base35-d050000 100% | 11.9 MiB/s | 85.4 KiB | 00m00s >>> Running post-install scriptlet: urw-base35-d050000l-fonts-0:20200910-19.fc40 >>> Stop post-install scriptlet: urw-base35-d050000l-fonts-0:20200910-19.fc40.no [176/576] Installing urw-base35-gothic- 100% | 116.3 MiB/s | 1.2 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc40.n >>> Stop post-install scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc40.noar [177/576] Installing urw-base35-nimbus- 100% | 116.9 MiB/s | 1.1 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-nimbus-mono-ps-fonts-0:20200910-1 >>> Stop post-install scriptlet: urw-base35-nimbus-mono-ps-fonts-0:20200910-19.f [178/576] Installing urw-base35-nimbus- 100% | 124.2 MiB/s | 1.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-nimbus-roman-fonts-0:20200910-19. >>> Stop post-install scriptlet: urw-base35-nimbus-roman-fonts-0:20200910-19.fc4 [179/576] Installing urw-base35-nimbus- 100% | 171.0 MiB/s | 2.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-nimbus-sans-fonts-0:20200910-19.f >>> Stop post-install scriptlet: urw-base35-nimbus-sans-fonts-0:20200910-19.fc40 [180/576] Installing urw-base35-p052-fo 100% | 135.2 MiB/s | 1.5 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40.noa >>> Stop post-install scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40.noarch [181/576] Installing urw-base35-standar 100% | 6.3 MiB/s | 45.1 KiB | 00m00s >>> Running post-install scriptlet: urw-base35-standard-symbols-ps-fonts-0:20200 >>> Stop post-install scriptlet: urw-base35-standard-symbols-ps-fonts-0:20200910 [182/576] Installing urw-base35-z003-fo 100% | 47.8 MiB/s | 391.8 KiB | 00m00s >>> Running post-install scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40.noa >>> Stop post-install scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40.noarch [183/576] Installing urw-base35-fonts-0 100% | 5.5 MiB/s | 5.6 KiB | 00m00s [184/576] Installing abattis-cantarell- 100% | 94.9 MiB/s | 194.4 KiB | 00m00s [185/576] Installing libksba-0:1.6.6-1. 100% | 171.7 MiB/s | 527.4 KiB | 00m00s [186/576] Installing leveldb-0:1.23-9.f 100% | 199.4 MiB/s | 408.4 KiB | 00m00s [187/576] Installing blosc-0:1.21.5-4.f 100% | 127.2 MiB/s | 260.6 KiB | 00m00s [188/576] Installing netcdf-0:4.9.2-5.f 100% | 338.2 MiB/s | 4.7 MiB | 00m00s [189/576] Installing libnvjitlink-12-3- 100% | 247.2 MiB/s | 46.0 MiB | 00m00s >>> Running post-install scriptlet: libnvjitlink-12-3-0:12.3.101-1.aarch64 >>> Stop post-install scriptlet: libnvjitlink-12-3-0:12.3.101-1.aarch64 [190/576] Installing libnpp-12-3-0:12.2 100% | 366.4 MiB/s | 234.9 MiB | 00m01s >>> Running post-install scriptlet: libnpp-12-3-0:12.2.3.2-2.aarch64 >>> Stop post-install scriptlet: libnpp-12-3-0:12.2.3.2-2.aarch64 [191/576] Installing libcusolver-12-3-0 100% | 358.9 MiB/s | 185.5 MiB | 00m01s >>> Running post-install scriptlet: libcusolver-12-3-0:11.5.4.101-2.aarch64 >>> Stop post-install scriptlet: libcusolver-12-3-0:11.5.4.101-2.aarch64 [192/576] Installing openblas-openmp64_ 100% | 420.1 MiB/s | 19.3 MiB | 00m00s [193/576] Installing openblas-serial-0: 100% | 439.3 MiB/s | 18.4 MiB | 00m00s [194/576] Installing openblas-serial64- 100% | 398.4 MiB/s | 18.3 MiB | 00m00s [195/576] Installing openblas-serial64_ 100% | 373.8 MiB/s | 18.3 MiB | 00m00s [196/576] Installing openblas-threads-0 100% | 108.1 MiB/s | 19.5 MiB | 00m00s [197/576] Installing openblas-threads64 100% | 126.4 MiB/s | 19.3 MiB | 00m00s [198/576] Installing openblas-threads64 100% | 93.4 MiB/s | 19.3 MiB | 00m00s [199/576] Installing ogdi-0:4.1.1-1.fc4 100% | 355.9 MiB/s | 2.1 MiB | 00m00s [200/576] Installing zvbi-0:0.2.35-22.f 100% | 149.9 MiB/s | 1.9 MiB | 00m00s >>> Running post-install scriptlet: zvbi-0:0.2.35-22.fc40.aarch64 >>> Stop post-install scriptlet: zvbi-0:0.2.35-22.fc40.aarch64 [201/576] Installing libharu-0:2.4.3-5. 100% | 165.8 MiB/s | 1.8 MiB | 00m00s [202/576] Installing ncurses-0:6.4-12.2 100% | 129.4 MiB/s | 1.7 MiB | 00m00s >>> Running pre-install scriptlet: groff-base-0:1.23.0-6.fc40.aarch64 >>> Stop pre-install scriptlet: groff-base-0:1.23.0-6.fc40.aarch64 [203/576] Installing groff-base-0:1.23. 100% | 125.3 MiB/s | 5.4 MiB | 00m00s >>> Running post-install scriptlet: groff-base-0:1.23.0-6.fc40.aarch64 >>> Stop post-install scriptlet: groff-base-0:1.23.0-6.fc40.aarch64 [204/576] Installing perl-Digest-0:1.20 100% | 18.0 MiB/s | 37.0 KiB | 00m00s [205/576] Installing perl-B-0:1.88-506. 100% | 197.8 MiB/s | 607.7 KiB | 00m00s [206/576] Installing perl-FileHandle-0: 100% | 9.5 MiB/s | 9.8 KiB | 00m00s [207/576] Installing perl-Digest-MD5-0: 100% | 114.1 MiB/s | 233.6 KiB | 00m00s [208/576] Installing perl-Data-Dumper-0 100% | 129.7 MiB/s | 265.5 KiB | 00m00s [209/576] Installing perl-libnet-0:3.15 100% | 95.8 MiB/s | 294.3 KiB | 00m00s [210/576] Installing perl-AutoLoader-0: 100% | 20.5 MiB/s | 20.9 KiB | 00m00s [211/576] Installing perl-URI-0:5.27-1. 100% | 61.4 MiB/s | 251.4 KiB | 00m00s [212/576] Installing perl-locale-0:1.10 100% | 0.0 B/s | 6.6 KiB | 00m00s [213/576] Installing perl-File-Path-0:2 100% | 63.0 MiB/s | 64.5 KiB | 00m00s [214/576] Installing perl-Mozilla-CA-0: 100% | 9.9 MiB/s | 10.2 KiB | 00m00s [215/576] Installing perl-Time-Local-2: 100% | 68.9 MiB/s | 70.5 KiB | 00m00s [216/576] Installing perl-Pod-Escapes-1 100% | 25.3 MiB/s | 25.9 KiB | 00m00s [217/576] Installing perl-Text-Tabs+Wra 100% | 23.3 MiB/s | 23.8 KiB | 00m00s [218/576] Installing perl-if-0:0.61.000 100% | 0.0 B/s | 6.2 KiB | 00m00s [219/576] Installing perl-IO-Socket-IP- 100% | 49.0 MiB/s | 100.4 KiB | 00m00s [220/576] Installing perl-Net-SSLeay-0: 100% | 179.1 MiB/s | 1.4 MiB | 00m00s [221/576] Installing perl-IO-Socket-SSL 100% | 224.3 MiB/s | 689.0 KiB | 00m00s [222/576] Installing perl-POSIX-0:2.13- 100% | 159.3 MiB/s | 326.3 KiB | 00m00s [223/576] Installing perl-IPC-Open3-0:1 100% | 22.7 MiB/s | 23.3 KiB | 00m00s [224/576] Installing perl-Class-Struct- 100% | 25.3 MiB/s | 25.9 KiB | 00m00s [225/576] Installing perl-Term-ANSIColo 100% | 96.8 MiB/s | 99.1 KiB | 00m00s [226/576] Installing perl-Term-Cap-0:1. 100% | 29.8 MiB/s | 30.5 KiB | 00m00s [227/576] Installing perl-File-Temp-1:0 100% | 160.2 MiB/s | 164.0 KiB | 00m00s [228/576] Installing perl-HTTP-Tiny-0:0 100% | 75.3 MiB/s | 154.2 KiB | 00m00s [229/576] Installing perl-Pod-Simple-1: 100% | 139.0 MiB/s | 569.4 KiB | 00m00s [230/576] Installing perl-Symbol-0:1.09 100% | 7.0 MiB/s | 7.2 KiB | 00m00s [231/576] Installing perl-SelectSaver-0 100% | 2.5 MiB/s | 2.6 KiB | 00m00s [232/576] Installing perl-Socket-4:2.03 100% | 133.6 MiB/s | 273.6 KiB | 00m00s [233/576] Installing perl-File-stat-0:1 100% | 12.9 MiB/s | 13.2 KiB | 00m00s [234/576] Installing perl-Pod-Perldoc-0 100% | 82.3 MiB/s | 168.6 KiB | 00m00s [235/576] Installing perl-podlators-1:5 100% | 152.4 MiB/s | 312.1 KiB | 00m00s [236/576] Installing perl-Text-ParseWor 100% | 14.2 MiB/s | 14.5 KiB | 00m00s [237/576] Installing perl-base-0:2.27-5 100% | 0.0 B/s | 12.9 KiB | 00m00s [238/576] Installing perl-Fcntl-0:1.15- 100% | 197.0 MiB/s | 201.7 KiB | 00m00s [239/576] Installing perl-mro-0:1.28-50 100% | 205.8 MiB/s | 210.7 KiB | 00m00s [240/576] Installing perl-overloading-0 100% | 5.4 MiB/s | 5.5 KiB | 00m00s [241/576] Installing perl-IO-0:1.52-506 100% | 157.8 MiB/s | 323.3 KiB | 00m00s [242/576] Installing perl-Pod-Usage-4:2 100% | 84.2 MiB/s | 86.3 KiB | 00m00s [243/576] Installing perl-File-Basename 100% | 14.2 MiB/s | 14.6 KiB | 00m00s [244/576] Installing perl-constant-0:1. 100% | 26.7 MiB/s | 27.4 KiB | 00m00s [245/576] Installing perl-Errno-0:1.37- 100% | 8.6 MiB/s | 8.8 KiB | 00m00s [246/576] Installing perl-Scalar-List-U 100% | 45.7 MiB/s | 280.7 KiB | 00m00s [247/576] Installing perl-vars-0:1.05-5 100% | 4.2 MiB/s | 4.3 KiB | 00m00s [248/576] Installing perl-Getopt-Std-0: 100% | 11.4 MiB/s | 11.6 KiB | 00m00s [249/576] Installing perl-overload-0:1. 100% | 70.3 MiB/s | 71.9 KiB | 00m00s [250/576] Installing perl-MIME-Base64-0 100% | 109.5 MiB/s | 224.3 KiB | 00m00s [251/576] Installing perl-parent-1:0.24 100% | 10.2 MiB/s | 10.4 KiB | 00m00s [252/576] Installing perl-Storable-1:3. 100% | 182.6 MiB/s | 373.9 KiB | 00m00s [253/576] Installing perl-Getopt-Long-1 100% | 143.3 MiB/s | 146.7 KiB | 00m00s [254/576] Installing perl-Carp-0:1.54-5 100% | 46.5 MiB/s | 47.7 KiB | 00m00s [255/576] Installing perl-Exporter-0:5. 100% | 54.2 MiB/s | 55.5 KiB | 00m00s [256/576] Installing perl-PathTools-0:3 100% | 173.9 MiB/s | 356.1 KiB | 00m00s [257/576] Installing perl-DynaLoader-0: 100% | 31.7 MiB/s | 32.5 KiB | 00m00s [258/576] Installing perl-Encode-4:3.21 100% | 330.4 MiB/s | 10.9 MiB | 00m00s [259/576] Installing perl-libs-4:5.38.2 100% | 49.4 MiB/s | 11.4 MiB | 00m00s [260/576] Installing perl-interpreter-4 100% | 2.0 MiB/s | 301.3 KiB | 00m00s [261/576] Installing infiniband-diags-0 100% | 393.9 MiB/s | 4.3 MiB | 00m00s [262/576] Installing perl-File-Find-0:1 100% | 41.4 MiB/s | 42.4 KiB | 00m00s [263/576] Installing perl-TermReadKey-0 100% | 116.3 MiB/s | 238.2 KiB | 00m00s [264/576] Installing perl-lib-0:0.65-50 100% | 0.0 B/s | 8.9 KiB | 00m00s [265/576] Installing perl-Error-1:0.170 100% | 78.5 MiB/s | 80.4 KiB | 00m00s [266/576] Installing pcre-0:8.45-1.fc40 100% | 243.5 MiB/s | 747.9 KiB | 00m00s [267/576] Installing gklib-0:5.1.1-2023 100% | 163.7 MiB/s | 335.2 KiB | 00m00s [268/576] Installing metis-0:5.2.1-2023 100% | 376.9 MiB/s | 1.5 MiB | 00m00s [269/576] Installing SuperLU-0:6.0.1-3. 100% | 255.6 MiB/s | 523.4 KiB | 00m00s [270/576] Installing armadillo-0:12.8.1 100% | 206.6 MiB/s | 211.6 KiB | 00m00s [271/576] Installing dbus-common-1:1.14 100% | 467.3 KiB/s | 13.6 KiB | 00m00s >>> Running post-install scriptlet: dbus-common-1:1.14.10-3.fc40.noarch >>> Stop post-install scriptlet: dbus-common-1:1.14.10-3.fc40.noarch >>> Running pre-install scriptlet: dbus-broker-0:35-4.fc40.aarch64 >>> Stop pre-install scriptlet: dbus-broker-0:35-4.fc40.aarch64 [272/576] Installing dbus-broker-0:35-4 100% | 60.2 MiB/s | 616.6 KiB | 00m00s >>> Running post-install scriptlet: dbus-broker-0:35-4.fc40.aarch64 >>> Stop post-install scriptlet: dbus-broker-0:35-4.fc40.aarch64 [273/576] Installing dbus-1:1.14.10-3.f 100% | 0.0 B/s | 124.0 B | 00m00s [274/576] Installing libseccomp-0:2.5.3 100% | 119.7 MiB/s | 245.1 KiB | 00m00s [275/576] Installing kmod-libs-0:31-5.f 100% | 56.3 MiB/s | 288.2 KiB | 00m00s [276/576] Installing cuda-cccl-12-3-0:1 100% | 174.7 MiB/s | 14.1 MiB | 00m00s [277/576] Installing annobin-docs-0:12. 100% | 23.6 MiB/s | 96.6 KiB | 00m00s [278/576] Installing kernel-headers-0:6 100% | 142.5 MiB/s | 6.3 MiB | 00m00s [279/576] Installing libxcrypt-devel-0: 100% | 15.9 MiB/s | 32.6 KiB | 00m00s [280/576] Installing glibc-devel-0:2.39 100% | 118.0 MiB/s | 2.2 MiB | 00m00s [281/576] Installing isl-0:0.16.1-20.fc 100% | 313.2 MiB/s | 3.4 MiB | 00m00s [282/576] Installing npth-0:1.7-1.fc40. 100% | 108.7 MiB/s | 222.6 KiB | 00m00s [283/576] Installing gnupg2-0:2.4.4-1.f 100% | 262.8 MiB/s | 12.4 MiB | 00m00s [284/576] Installing gpgme-0:1.23.2-3.f 100% | 198.5 MiB/s | 813.2 KiB | 00m00s [285/576] Installing gpgmepp-0:1.23.2-3 100% | 255.3 MiB/s | 522.8 KiB | 00m00s [286/576] Installing uriparser-0:0.9.7- 100% | 158.4 MiB/s | 486.5 KiB | 00m00s [287/576] Installing libkml-0:1.3.0-47. 100% | 309.3 MiB/s | 1.9 MiB | 00m00s [288/576] Installing utf8proc-0:2.7.0-7 100% | 263.6 MiB/s | 539.8 KiB | 00m00s [289/576] Installing re2-1:20220601-5.f 100% | 213.4 MiB/s | 655.7 KiB | 00m00s [290/576] Installing libarrow-doc-0:15. 100% | 113.5 MiB/s | 116.2 KiB | 00m00s [291/576] Installing libarrow-0:15.0.2- 100% | 199.0 MiB/s | 19.5 MiB | 00m00s [292/576] Installing proj-data-0:9.3.1- 100% | 86.2 MiB/s | 8.5 MiB | 00m00s [293/576] Installing libdicom-0:1.0.5-3 100% | 253.5 MiB/s | 519.2 KiB | 00m00s [294/576] Installing mariadb-connector- 100% | 988.3 KiB/s | 1.0 KiB | 00m00s [295/576] Installing mariadb-connector- 100% | 407.5 MiB/s | 2.0 MiB | 00m00s [296/576] Installing xerces-c-0:3.2.5-2 100% | 257.7 MiB/s | 3.6 MiB | 00m00s [297/576] Installing unixODBC-0:2.3.12- 100% | 30.1 MiB/s | 2.8 MiB | 00m00s [298/576] Installing libqhull_r-1:8.0.2 100% | 285.3 MiB/s | 584.3 KiB | 00m00s [299/576] Installing libpq-0:16.1-4.fc4 100% | 209.4 MiB/s | 1.0 MiB | 00m00s [300/576] Installing libgta-0:1.2.1-12. 100% | 218.3 MiB/s | 223.5 KiB | 00m00s [301/576] Installing libdeflate-0:1.20- 100% | 220.7 MiB/s | 226.0 KiB | 00m00s [302/576] Installing giflib-0:5.2.2-1.f 100% | 255.7 MiB/s | 261.8 KiB | 00m00s [303/576] Installing cfitsio-0:4.4.0-2. 100% | 250.5 MiB/s | 1.8 MiB | 00m00s [304/576] Installing libwacom-data-0:2. 100% | 51.6 MiB/s | 686.8 KiB | 00m00s [305/576] Installing cliquer-libs-0:1.2 100% | 212.1 MiB/s | 217.2 KiB | 00m00s [306/576] Installing libnauty-0:2.8.8-3 100% | 395.6 MiB/s | 5.1 MiB | 00m00s [307/576] Installing google-noto-fonts- 100% | 17.8 MiB/s | 18.3 KiB | 00m00s [308/576] Installing google-noto-sans-v 100% | 249.8 MiB/s | 1.2 MiB | 00m00s [309/576] Installing default-fonts-core 100% | 3.6 MiB/s | 18.2 KiB | 00m00s [310/576] Installing google-droid-sans- 100% | 208.6 MiB/s | 6.3 MiB | 00m00s [311/576] Installing pugixml-0:1.13-5.f 100% | 107.6 MiB/s | 330.6 KiB | 00m00s [312/576] Installing xkeyboard-config-0 100% | 67.7 MiB/s | 6.6 MiB | 00m00s [313/576] Installing libxkbcommon-0:1.6 100% | 194.7 MiB/s | 598.1 KiB | 00m00s [314/576] Installing systemd-pam-0:255. 100% | 193.7 MiB/s | 1.4 MiB | 00m00s [315/576] Installing systemd-0:255.4-1. 100% | 72.0 MiB/s | 26.3 MiB | 00m00s >>> Running post-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Stop post-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Running pre-install scriptlet: samba-common-2:4.20.0-0.5.rc4.fc40.noarch >>> Stop pre-install scriptlet: samba-common-2:4.20.0-0.5.rc4.fc40.noarch [316/576] Installing samba-common-2:4.2 100% | 8.3 MiB/s | 143.6 KiB | 00m00s >>> Running post-install scriptlet: samba-common-2:4.20.0-0.5.rc4.fc40.noarch >>> Stop post-install scriptlet: samba-common-2:4.20.0-0.5.rc4.fc40.noarch >>> Running pre-install scriptlet: libwbclient-2:4.20.0-0.5.rc4.fc40.aarch64 >>> Stop pre-install scriptlet: libwbclient-2:4.20.0-0.5.rc4.fc40.aarch64 [317/576] Installing libwbclient-2:4.20 100% | 37.0 MiB/s | 75.7 KiB | 00m00s [318/576] Installing samba-common-libs- 100% | 87.9 MiB/s | 270.0 KiB | 00m00s [319/576] Installing samba-client-libs- 100% | 282.4 MiB/s | 20.6 MiB | 00m00s [320/576] Installing libsmbclient-2:4.2 100% | 85.6 MiB/s | 175.3 KiB | 00m00s [321/576] Installing libxkbcommon-x11-0 100% | 191.8 MiB/s | 196.4 KiB | 00m00s [322/576] Installing mtdev-0:1.1.6-8.fc 100% | 38.8 MiB/s | 198.5 KiB | 00m00s [323/576] Installing duktape-0:2.7.0-7. 100% | 302.8 MiB/s | 930.1 KiB | 00m00s [324/576] Installing libproxy-0:0.5.3-5 100% | 211.7 MiB/s | 433.5 KiB | 00m00s [325/576] Installing zimg-0:3.0.5-2.fc4 100% | 230.9 MiB/s | 472.8 KiB | 00m00s [326/576] Installing mbedtls-0:2.28.7-1 100% | 284.0 MiB/s | 1.4 MiB | 00m00s [327/576] Installing cjson-0:1.7.15-4.f 100% | 14.7 MiB/s | 225.0 KiB | 00m00s >>> Running post-install scriptlet: cjson-0:1.7.15-4.fc40.aarch64 >>> Stop post-install scriptlet: cjson-0:1.7.15-4.fc40.aarch64 [328/576] Installing librist-0:0.2.7-4. 100% | 5.5 MiB/s | 270.9 KiB | 00m00s [329/576] Installing mpg123-libs-0:1.31 100% | 209.0 MiB/s | 1.0 MiB | 00m00s [330/576] Installing libopenmpt-0:0.7.3 100% | 267.1 MiB/s | 1.6 MiB | 00m00s [331/576] Installing libudfread-0:1.1.2 100% | 218.2 MiB/s | 223.4 KiB | 00m00s [332/576] Installing mesa-filesystem-0: 100% | 4.2 MiB/s | 4.3 KiB | 00m00s [333/576] Installing soxr-0:0.1.3-15.fc 100% | 227.1 MiB/s | 465.2 KiB | 00m00s [334/576] Installing highway-0:1.1.0-1. 100% | 258.9 MiB/s | 795.3 KiB | 00m00s [335/576] Installing libjxl-1:0.8.2-6.f 100% | 301.4 MiB/s | 2.1 MiB | 00m00s [336/576] Installing lpcnetfreedv-0:0.5 100% | 414.5 MiB/s | 14.9 MiB | 00m00s [337/576] Installing codec2-0:1.2.0-4.f 100% | 274.6 MiB/s | 1.4 MiB | 00m00s [338/576] Installing MUMPS-common-0:5.6 100% | 463.4 MiB/s | 949.0 KiB | 00m00s [339/576] Installing MUMPS-0:5.6.2-3.fc 100% | 363.3 MiB/s | 8.4 MiB | 00m00s [340/576] Installing coin-or-Clp-0:1.17 100% | 205.8 MiB/s | 2.7 MiB | 00m00s [341/576] Installing coin-or-Cgl-0:0.60 100% | 243.2 MiB/s | 996.3 KiB | 00m00s [342/576] Installing coin-or-Cbc-0:2.10 100% | 292.5 MiB/s | 2.6 MiB | 00m00s [343/576] Installing pcre2-utf32-0:10.4 100% | 189.7 MiB/s | 582.8 KiB | 00m00s [344/576] Installing pcre2-devel-0:10.4 100% | 192.7 MiB/s | 1.9 MiB | 00m00s [345/576] Installing libcbor-0:0.11.0-1 100% | 198.5 MiB/s | 203.3 KiB | 00m00s [346/576] Installing libfido2-0:1.14.0- 100% | 167.7 MiB/s | 343.4 KiB | 00m00s [347/576] Installing gc-0:8.2.2-6.fc40. 100% | 166.6 MiB/s | 852.9 KiB | 00m00s [348/576] Installing guile30-0:3.0.7-12 100% | 117.2 MiB/s | 52.1 MiB | 00m00s [349/576] Installing make-1:4.4.1-6.fc4 100% | 185.1 MiB/s | 1.9 MiB | 00m00s [350/576] Installing poppler-data-0:0.4 100% | 111.6 MiB/s | 12.4 MiB | 00m00s [351/576] Installing libdatrie-0:0.2.13 100% | 217.8 MiB/s | 223.0 KiB | 00m00s [352/576] Installing libthai-0:0.1.29-8 100% | 228.8 MiB/s | 937.2 KiB | 00m00s [353/576] Installing tbb2020.3-0:2020.3 100% | 137.9 MiB/s | 282.5 KiB | 00m00s [354/576] Installing qt-settings-0:40.0 100% | 1.6 MiB/s | 1.7 KiB | 00m00s [355/576] Installing qt5-qtbase-common- 100% | 49.7 KiB/s | 356.0 B | 00m00s >>> Running pre-install scriptlet: qt5-qtbase-0:5.15.13-1.fc40.aarch64 >>> Stop pre-install scriptlet: qt5-qtbase-0:5.15.13-1.fc40.aarch64 [356/576] Installing qt5-qtbase-0:5.15. 100% | 84.4 MiB/s | 11.5 MiB | 00m00s >>> Running post-install scriptlet: qt5-qtbase-0:5.15.13-1.fc40.aarch64 >>> Stop post-install scriptlet: qt5-qtbase-0:5.15.13-1.fc40.aarch64 [357/576] Installing jbigkit-libs-0:2.1 100% | 214.6 MiB/s | 439.5 KiB | 00m00s [358/576] Installing libtiff-0:4.6.0-2. 100% | 276.5 MiB/s | 1.7 MiB | 00m00s [359/576] Installing proj-0:9.3.1-3.fc4 100% | 115.9 MiB/s | 5.2 MiB | 00m00s [360/576] Installing libgeotiff-0:1.7.1 100% | 268.3 MiB/s | 1.1 MiB | 00m00s [361/576] Installing libspatialite-0:5. 100% | 71.9 MiB/s | 15.7 MiB | 00m00s [362/576] Installing libusb1-0:1.0.27-1 100% | 119.1 MiB/s | 243.8 KiB | 00m00s [363/576] Installing libraw1394-0:2.1.2 100% | 269.0 MiB/s | 826.3 KiB | 00m00s [364/576] Installing libdc1394-0:2.2.7- 100% | 217.2 MiB/s | 444.8 KiB | 00m00s [365/576] Installing librabbitmq-0:0.13 100% | 194.2 MiB/s | 198.8 KiB | 00m00s [366/576] Installing libmodplug-1:0.8.9 100% | 201.6 MiB/s | 413.0 KiB | 00m00s [367/576] Installing game-music-emu-0:0 100% | 177.7 MiB/s | 363.8 KiB | 00m00s [368/576] Installing xvidcore-0:1.3.7-1 100% | 242.9 MiB/s | 746.2 KiB | 00m00s [369/576] Installing vo-amrwbenc-0:0.1. 100% | 237.6 MiB/s | 243.3 KiB | 00m00s [370/576] Installing twolame-libs-0:0.4 100% | 217.5 MiB/s | 222.7 KiB | 00m00s [371/576] Installing speex-0:1.2.0-17.f 100% | 197.7 MiB/s | 202.4 KiB | 00m00s [372/576] Installing opencore-amr-0:0.1 100% | 269.4 MiB/s | 551.7 KiB | 00m00s [373/576] Installing libvpx-0:1.14.0-1. 100% | 264.0 MiB/s | 2.6 MiB | 00m00s [374/576] Installing lame-libs-0:3.100- 100% | 321.0 MiB/s | 1.3 MiB | 00m00s [375/576] Installing ilbc-0:3.0.4-10.fc 100% | 204.0 MiB/s | 208.9 KiB | 00m00s [376/576] Installing gsm-0:1.0.22-6.fc4 100% | 201.5 MiB/s | 206.4 KiB | 00m00s [377/576] Installing fdk-aac-free-0:2.0 100% | 213.8 MiB/s | 656.9 KiB | 00m00s [378/576] Installing orc-0:0.4.38-2.fc4 100% | 292.5 MiB/s | 1.2 MiB | 00m00s [379/576] Installing libwayland-egl-0:1 100% | 193.0 MiB/s | 197.6 KiB | 00m00s [380/576] Installing libvisual-1:0.4.1- 100% | 180.3 MiB/s | 553.8 KiB | 00m00s [381/576] Installing cdparanoia-libs-0: 100% | 192.9 MiB/s | 395.0 KiB | 00m00s [382/576] Installing alsa-lib-0:1.2.11- 100% | 206.6 MiB/s | 1.9 MiB | 00m00s [383/576] Installing libsepol-devel-0:3 100% | 41.6 MiB/s | 127.7 KiB | 00m00s [384/576] Installing libselinux-devel-0 100% | 26.1 MiB/s | 160.6 KiB | 00m00s [385/576] Installing vim-filesystem-2:9 100% | 4.6 MiB/s | 4.7 KiB | 00m00s [386/576] Installing emacs-filesystem-1 100% | 0.0 B/s | 544.0 B | 00m00s [387/576] Installing openssh-0:9.6p1-1. 100% | 332.0 MiB/s | 2.0 MiB | 00m00s [388/576] Installing openssh-clients-0: 100% | 5.9 MiB/s | 3.5 MiB | 00m01s >>> Running post-install scriptlet: openssh-clients-0:9.6p1-1.fc40.2.aarch64 >>> Stop post-install scriptlet: openssh-clients-0:9.6p1-1.fc40.2.aarch64 [389/576] Installing fribidi-0:1.0.13-4 100% | 219.9 MiB/s | 675.6 KiB | 00m00s [390/576] Installing libpaper-1:2.1.1-3 100% | 221.2 MiB/s | 226.5 KiB | 00m00s [391/576] Installing libijs-0:0.35-22.f 100% | 225.2 MiB/s | 230.6 KiB | 00m00s [392/576] Installing jbig2dec-libs-0:0. 100% | 147.7 MiB/s | 302.6 KiB | 00m00s [393/576] Installing adobe-mappings-pdf 100% | 274.8 MiB/s | 4.4 MiB | 00m00s [394/576] Installing libX11-common-0:1. 100% | 91.2 MiB/s | 1.2 MiB | 00m00s [395/576] Installing libX11-0:1.8.7-3.f 100% | 268.5 MiB/s | 1.3 MiB | 00m00s [396/576] Installing libXext-0:1.3.6-1. 100% | 206.2 MiB/s | 211.1 KiB | 00m00s [397/576] Installing libXrender-0:0.9.1 100% | 194.6 MiB/s | 199.3 KiB | 00m00s [398/576] Installing libXfixes-0:6.0.1- 100% | 195.0 MiB/s | 199.7 KiB | 00m00s [399/576] Installing libXcursor-0:1.2.1 100% | 194.5 MiB/s | 199.1 KiB | 00m00s [400/576] Installing libXi-0:1.8.1-5.fc 100% | 196.9 MiB/s | 201.6 KiB | 00m00s [401/576] Installing libXv-0:1.0.12-3.f 100% | 194.6 MiB/s | 199.3 KiB | 00m00s [402/576] Installing libvdpau-0:1.5-6.f 100% | 193.8 MiB/s | 198.4 KiB | 00m00s [403/576] Installing libXxf86vm-0:1.1.5 100% | 193.9 MiB/s | 198.5 KiB | 00m00s [404/576] Installing libglvnd-glx-1:1.7 100% | 444.9 MiB/s | 1.3 MiB | 00m00s [405/576] Installing mesa-libGL-0:24.0. 100% | 236.5 MiB/s | 726.6 KiB | 00m00s [406/576] Installing libva-0:2.21.0-3.f 100% | 370.1 MiB/s | 1.1 MiB | 00m00s [407/576] Installing libavutil-free-0:6 100% | 304.4 MiB/s | 935.1 KiB | 00m00s [408/576] Installing libswscale-free-0: 100% | 235.3 MiB/s | 481.8 KiB | 00m00s [409/576] Installing libswresample-free 100% | 214.7 MiB/s | 219.9 KiB | 00m00s [410/576] Installing glx-utils-0:9.0.0- 100% | 414.1 MiB/s | 848.1 KiB | 00m00s [411/576] Installing libGLEW-0:2.2.0-7. 100% | 164.4 MiB/s | 841.5 KiB | 00m00s [412/576] Installing libX11-devel-0:1.8 100% | 53.3 MiB/s | 1.1 MiB | 00m00s [413/576] Installing libXpm-0:3.5.17-3. 100% | 129.8 MiB/s | 265.8 KiB | 00m00s [414/576] Installing libXt-0:1.3.0-3.fc 100% | 296.3 MiB/s | 606.8 KiB | 00m00s [415/576] Installing graphite2-0:1.3.14 100% | 243.1 MiB/s | 497.9 KiB | 00m00s [416/576] Installing netpbm-0:11.02.00- 100% | 205.4 MiB/s | 630.9 KiB | 00m00s [417/576] Installing gts-0:0.7.6-48.201 100% | 400.9 MiB/s | 2.4 MiB | 00m00s [418/576] Installing libimagequant-0:4. 100% | 238.3 MiB/s | 732.1 KiB | 00m00s [419/576] Installing pixman-0:0.43.0-3. 100% | 234.2 MiB/s | 719.4 KiB | 00m00s [420/576] Installing cairo-0:1.18.0-3.f 100% | 281.4 MiB/s | 2.0 MiB | 00m00s [421/576] Installing harfbuzz-0:8.3.0-5 100% | 293.5 MiB/s | 2.9 MiB | 00m00s [422/576] Installing freetype-0:2.13.2- 100% | 184.5 MiB/s | 944.6 KiB | 00m00s [423/576] Installing fontconfig-0:2.15. 100% | 2.0 MiB/s | 2.4 MiB | 00m01s >>> Running post-install scriptlet: fontconfig-0:2.15.0-4.fc40.aarch64 >>> Stop post-install scriptlet: fontconfig-0:2.15.0-4.fc40.aarch64 [424/576] Installing cairo-gobject-0:1. 100% | 191.4 MiB/s | 196.0 KiB | 00m00s [425/576] Installing gd-0:2.3.3-16.fc40 100% | 126.1 MiB/s | 516.7 KiB | 00m00s [426/576] Installing libgs-0:10.02.1-8. 100% | 422.4 MiB/s | 23.7 MiB | 00m00s [427/576] Installing libXft-0:2.3.8-6.f 100% | 125.9 MiB/s | 257.9 KiB | 00m00s [428/576] Installing pango-0:1.51.2-1.f 100% | 272.9 MiB/s | 1.9 MiB | 00m00s [429/576] Installing librsvg2-0:2.57.1- 100% | 316.6 MiB/s | 4.4 MiB | 00m00s [430/576] Installing rsvg-pixbuf-loader 100% | 191.9 MiB/s | 196.5 KiB | 00m00s [431/576] Installing libavcodec-free-0: 100% | 320.2 MiB/s | 9.6 MiB | 00m00s [432/576] Installing libchromaprint-0:1 100% | 102.6 MiB/s | 210.0 KiB | 00m00s [433/576] Installing gdk-pixbuf2-module 100% | 421.3 MiB/s | 2.1 MiB | 00m00s [434/576] Installing openslide-0:4.0.0- 100% | 211.0 MiB/s | 432.2 KiB | 00m00s [435/576] Installing lasi-0:1.1.3-13.fc 100% | 126.9 MiB/s | 259.9 KiB | 00m00s [436/576] Installing libbluray-0:1.3.4- 100% | 241.9 MiB/s | 495.3 KiB | 00m00s [437/576] Installing libverto-devel-0:0 100% | 25.7 MiB/s | 26.4 KiB | 00m00s [438/576] Installing libkadm5-0:1.21.2- 100% | 224.6 MiB/s | 460.1 KiB | 00m00s [439/576] Installing libcom_err-devel-0 100% | 17.8 MiB/s | 18.3 KiB | 00m00s [440/576] Installing keyutils-libs-deve 100% | 27.0 MiB/s | 55.2 KiB | 00m00s [441/576] Installing krb5-devel-0:1.21. 100% | 174.8 MiB/s | 715.9 KiB | 00m00s [442/576] Installing hwloc-libs-0:2.10. 100% | 415.4 MiB/s | 2.9 MiB | 00m00s [443/576] Installing tbb-bind-0:2021.11 100% | 191.9 MiB/s | 196.5 KiB | 00m00s [444/576] Installing liburing-0:2.5-3.f 100% | 205.4 MiB/s | 420.7 KiB | 00m00s [445/576] Installing rocksdb-0:8.10.0-3 100% | 286.6 MiB/s | 8.9 MiB | 00m00s [446/576] Installing tzdata-0:2024a-4.f 100% | 38.8 MiB/s | 1.9 MiB | 00m00s [447/576] Installing python-pip-wheel-0 100% | 506.6 MiB/s | 1.5 MiB | 00m00s [448/576] Installing mpdecimal-0:2.5.1- 100% | 161.0 MiB/s | 329.8 KiB | 00m00s [449/576] Installing libb2-0:0.98.1-11. 100% | 28.4 MiB/s | 203.2 KiB | 00m00s [450/576] Installing python3-libs-0:3.1 100% | 292.2 MiB/s | 52.3 MiB | 00m00s [451/576] Installing python3-0:3.12.2-2 100% | 104.3 MiB/s | 213.5 KiB | 00m00s [452/576] Installing gstreamer1-0:1.22. 100% | 293.5 MiB/s | 6.7 MiB | 00m00s [453/576] Installing gstreamer1-plugins 100% | 338.6 MiB/s | 12.5 MiB | 00m00s [454/576] Installing cmake-rpm-macros-0 100% | 211.7 KiB/s | 8.0 KiB | 00m00s [455/576] Installing python3-six-0:1.16 100% | 3.4 MiB/s | 120.1 KiB | 00m00s [456/576] Installing onnx-optimizer-0:0 100% | 244.5 MiB/s | 1.0 MiB | 00m00s [457/576] Installing libwacom-0:2.10.0- 100% | 199.5 MiB/s | 408.5 KiB | 00m00s [458/576] Installing libinput-0:1.25.0- 100% | 98.1 MiB/s | 1.7 MiB | 00m00s >>> Running post-install scriptlet: libinput-0:1.25.0-3.fc40.aarch64 >>> Stop post-install scriptlet: libinput-0:1.25.0-3.fc40.aarch64 [459/576] Installing qt5-qtbase-gui-0:5 100% | 349.1 MiB/s | 24.4 MiB | 00m00s [460/576] Installing crypto-policies-sc 100% | 29.2 MiB/s | 328.7 KiB | 00m00s [461/576] Installing nss-sysinit-0:3.98 100% | 97.4 MiB/s | 199.4 KiB | 00m00s [462/576] Installing nss-0:3.98.0-1.fc4 100% | 167.1 MiB/s | 2.2 MiB | 00m00s >>> Running post-install scriptlet: nss-0:3.98.0-1.fc40.aarch64 >>> Stop post-install scriptlet: nss-0:3.98.0-1.fc40.aarch64 [463/576] Installing poppler-0:24.02.0- 100% | 279.9 MiB/s | 3.9 MiB | 00m00s [464/576] Installing poppler-glib-0:24. 100% | 93.0 MiB/s | 666.8 KiB | 00m00s [465/576] Installing graphviz-0:9.0.0-1 100% | 67.9 MiB/s | 27.6 MiB | 00m00s [466/576] Installing gdal-libs-0:3.8.4- 100% | 315.7 MiB/s | 25.9 MiB | 00m00s [467/576] Installing vtk-0:9.2.6-12.fc4 100% | 375.4 MiB/s | 113.4 MiB | 00m00s [468/576] Installing python3-packaging- 100% | 140.6 MiB/s | 431.9 KiB | 00m00s [469/576] Installing python3-rpm-genera 100% | 81.0 MiB/s | 82.9 KiB | 00m00s [470/576] Installing vapoursynth-libs-0 100% | 232.7 MiB/s | 1.2 MiB | 00m00s [471/576] Installing libavformat-free-0 100% | 268.1 MiB/s | 2.7 MiB | 00m00s [472/576] Installing opencv-cuda-0:4.9. 100% | 129.2 MiB/s | 573.5 MiB | 00m04s [473/576] Installing opencv-core-0:4.9. 100% | 97.2 MiB/s | 50.0 MiB | 00m01s [474/576] Installing opencv-0:4.9.0-202 100% | 132.2 MiB/s | 22.0 MiB | 00m00s [475/576] Installing opencv-contrib-0:4 100% | 88.0 MiB/s | 17.9 MiB | 00m00s [476/576] Installing opencv-static-0:4. 100% | 11.4 MiB/s | 2.5 MiB | 00m00s [477/576] Installing opencv-devel-0:4.9 100% | 22.8 MiB/s | 10.9 MiB | 00m00s [478/576] Installing rhash-0:1.4.3-4.fc 100% | 144.0 MiB/s | 589.8 KiB | 00m00s [479/576] Installing cmake-0:3.28.2-1.f 100% | 336.1 MiB/s | 28.6 MiB | 00m00s [480/576] Installing cmake-data-0:3.28. 100% | 86.6 MiB/s | 8.5 MiB | 00m00s [481/576] Installing pybind11-devel-0:2 100% | 119.5 MiB/s | 856.4 KiB | 00m00s [482/576] Installing libglvnd-core-deve 100% | 40.1 MiB/s | 41.1 KiB | 00m00s [483/576] Installing libglvnd-devel-1:1 100% | 424.1 MiB/s | 2.1 MiB | 00m00s [484/576] Installing less-0:643-4.fc40. 100% | 196.2 MiB/s | 803.6 KiB | 00m00s [485/576] Installing git-core-0:2.44.0- 100% | 364.1 MiB/s | 21.8 MiB | 00m00s [486/576] Installing git-core-doc-0:2.4 100% | 261.3 MiB/s | 17.0 MiB | 00m00s [487/576] Installing perl-Git-0:2.44.0- 100% | 1.2 MiB/s | 65.0 KiB | 00m00s [488/576] Installing git-0:2.44.0-1.fc4 100% | 882.9 KiB/s | 87.4 KiB | 00m00s [489/576] Installing libubsan-0:14.0.1- 100% | 263.7 MiB/s | 540.1 KiB | 00m00s [490/576] Installing libatomic-0:14.0.1 100% | 193.1 MiB/s | 197.8 KiB | 00m00s [491/576] Installing libasan-0:14.0.1-0 100% | 30.8 MiB/s | 1.6 MiB | 00m00s [492/576] Installing gcc-0:14.0.1-0.13. 100% | 76.4 MiB/s | 93.3 MiB | 00m01s >>> Running trigger-install scriptlet: redhat-rpm-config-0:286-1.fc40.noarch >>> Stop trigger-install scriptlet: redhat-rpm-config-0:286-1.fc40.noarch [493/576] Installing sleef-0:3.6-202403 100% | 254.4 MiB/s | 1.5 MiB | 00m00s [494/576] Installing cuda-nvvm-12-3-0:1 100% | 244.3 MiB/s | 58.1 MiB | 00m00s [495/576] Installing cuda-crt-12-3-0:12 100% | 328.4 MiB/s | 1.0 MiB | 00m00s [496/576] Installing asmjit-1:0-2022070 100% | 150.6 MiB/s | 462.6 KiB | 00m00s [497/576] Installing libyaml-0:0.2.5-14 100% | 257.7 MiB/s | 263.9 KiB | 00m00s [498/576] Installing zlib-ng-compat-dev 100% | 102.0 MiB/s | 104.5 KiB | 00m00s [499/576] Installing opencl-headers-0:3 100% | 354.2 MiB/s | 725.3 KiB | 00m00s [500/576] Installing numactl-libs-0:2.0 100% | 193.2 MiB/s | 197.8 KiB | 00m00s [501/576] Installing miniz-0:3.0.2-5.fc 100% | 54.0 MiB/s | 221.3 KiB | 00m00s [502/576] Installing gl-manpages-0:1.1- 100% | 45.9 MiB/s | 1.1 MiB | 00m00s [503/576] Installing gmp-c++-1:6.2.1-8. 100% | 3.4 MiB/s | 196.4 KiB | 00m00s [504/576] Installing gmp-devel-1:6.2.1- 100% | 5.2 MiB/s | 358.2 KiB | 00m00s [505/576] Installing libstdc++-devel-0: 100% | 167.2 MiB/s | 15.2 MiB | 00m00s [506/576] Installing gcc-c++-0:14.0.1-0 100% | 49.8 MiB/s | 35.0 MiB | 00m01s [507/576] Installing cuda-nvcc-12-3-0:1 100% | 77.9 MiB/s | 171.7 MiB | 00m02s [508/576] Installing fp16-1:0-20240410. 100% | 97.4 MiB/s | 199.5 KiB | 00m00s [509/576] Installing foxi-0:0-20210526. 100% | 34.1 MiB/s | 69.9 KiB | 00m00s [510/576] Installing xapian-core-libs-0 100% | 92.0 MiB/s | 2.1 MiB | 00m00s [511/576] Installing cuda-nvtx-12-3-0:1 100% | 101.3 MiB/s | 414.8 KiB | 00m00s [512/576] Installing cuda-driver-devel- 100% | 4.3 MiB/s | 122.9 KiB | 00m00s [513/576] Installing cutlass-0:3.4.1-20 100% | 117.8 MiB/s | 1.0 GiB | 00m09s [514/576] Installing cuda-cupti-12-3-0: 100% | 34.6 MiB/s | 50.4 MiB | 00m01s [515/576] Installing kineto-0:0.4.0-202 100% | 192.4 MiB/s | 788.2 KiB | 00m00s [516/576] Installing kineto-devel-0:0.4 100% | 6.4 MiB/s | 52.8 KiB | 00m00s [517/576] Installing cutlass-devel-0:3. 100% | 197.4 MiB/s | 12.2 MiB | 00m00s [518/576] Installing doxygen-2:1.10.0-3 100% | 130.6 MiB/s | 19.5 MiB | 00m00s [519/576] Installing foxi-devel-0:0-202 100% | 59.4 MiB/s | 121.7 KiB | 00m00s [520/576] Installing fp16-devel-1:0-202 100% | 30.5 MiB/s | 31.2 KiB | 00m00s [521/576] Installing mpfr-devel-0:4.2.1 100% | 31.0 MiB/s | 63.5 KiB | 00m00s [522/576] Installing mesa-libGLU-devel- 100% | 17.1 MiB/s | 17.5 KiB | 00m00s [523/576] Installing miniz-devel-0:3.0. 100% | 101.7 MiB/s | 104.1 KiB | 00m00s [524/576] Installing numactl-devel-0:2. 100% | 13.1 MiB/s | 26.8 KiB | 00m00s [525/576] Installing ocl-icd-devel-0:2. 100% | 65.8 MiB/s | 337.0 KiB | 00m00s [526/576] Installing protobuf-compat-de 100% | 153.2 MiB/s | 2.8 MiB | 00m00s [527/576] Installing python3-pyyaml-0:6 100% | 85.2 MiB/s | 872.1 KiB | 00m00s [528/576] Installing asmjit-devel-1:0-2 100% | 153.3 MiB/s | 1.5 MiB | 00m00s [529/576] Installing sleef-devel-0:3.6- 100% | 189.7 MiB/s | 194.2 KiB | 00m00s [530/576] Installing annobin-plugin-gcc 100% | 63.7 MiB/s | 1.1 MiB | 00m00s >>> Running trigger-install scriptlet: redhat-rpm-config-0:286-1.fc40.noarch >>> Stop trigger-install scriptlet: redhat-rpm-config-0:286-1.fc40.noarch [531/576] Installing gcc-plugin-annobin 100% | 9.2 MiB/s | 198.6 KiB | 00m00s >>> Running trigger-install scriptlet: redhat-rpm-config-0:286-1.fc40.noarch >>> Stop trigger-install scriptlet: redhat-rpm-config-0:286-1.fc40.noarch [532/576] Installing python3-pybind11-0 100% | 70.3 MiB/s | 863.3 KiB | 00m00s [533/576] Installing python3-devel-0:3. 100% | 85.0 MiB/s | 1.3 MiB | 00m00s [534/576] Installing onnx-optimizer-dev 100% | 40.0 MiB/s | 205.0 KiB | 00m00s [535/576] Installing peachpy-python3-0: 100% | 241.5 MiB/s | 13.3 MiB | 00m00s [536/576] Installing python3-numpy-1:1. 100% | 159.2 MiB/s | 41.9 MiB | 00m00s [537/576] Installing python3-setuptools 100% | 121.7 MiB/s | 7.3 MiB | 00m00s [538/576] Installing python3-typing-ext 100% | 128.1 MiB/s | 393.4 KiB | 00m00s [539/576] Installing rocksdb-devel-0:8. 100% | 101.3 MiB/s | 1.4 MiB | 00m00s [540/576] Installing tbb-devel-0:2021.1 100% | 89.6 MiB/s | 1.3 MiB | 00m00s [541/576] Installing zeromq-devel-0:4.3 100% | 15.2 MiB/s | 31.1 KiB | 00m00s [542/576] Installing cuda-gcc-12-c++-0: 100% | 307.0 MiB/s | 57.4 MiB | 00m00s [543/576] Installing cuda-cudart-devel- 100% | 65.2 MiB/s | 6.5 MiB | 00m00s [544/576] Installing rdma-core-devel-0: 100% | 15.4 MiB/s | 677.6 KiB | 00m00s [545/576] Installing openblas-devel-0:0 100% | 17.8 MiB/s | 1.7 MiB | 00m00s [546/576] Installing libcusolver-devel- 100% | 11.0 MiB/s | 463.5 KiB | 00m00s [547/576] Installing libnvjitlink-devel 100% | 38.0 MiB/s | 55.5 MiB | 00m01s [548/576] Installing leveldb-devel-0:1. 100% | 15.4 MiB/s | 142.4 KiB | 00m00s [549/576] Installing tensorpipe-devel-0 100% | 45.8 MiB/s | 516.1 KiB | 00m00s [550/576] Installing glog-devel-0:0.3.5 100% | 111.0 MiB/s | 113.6 KiB | 00m00s [551/576] Installing qnnpack-devel-0:0- 100% | 18.4 MiB/s | 18.8 KiB | 00m00s [552/576] Installing nnpack-devel-0:0-2 100% | 6.1 MiB/s | 43.7 KiB | 00m00s [553/576] Installing lmdb-devel-0:0.9.3 100% | 6.5 MiB/s | 73.0 KiB | 00m00s [554/576] Installing magma-devel-0:2.8. 100% | 162.4 MiB/s | 21.9 MiB | 00m00s [555/576] Installing fftw-devel-0:3.3.1 100% | 46.8 MiB/s | 287.8 KiB | 00m00s [556/576] Installing gloo-devel-1:0.5.0 100% | 48.0 MiB/s | 344.1 KiB | 00m00s [557/576] Installing flatbuffers-compil 100% | 91.7 MiB/s | 2.5 MiB | 00m00s [558/576] Installing flatbuffers-devel- 100% | 115.1 MiB/s | 471.4 KiB | 00m00s [559/576] Installing hiredis-devel-0:1. 100% | 59.3 MiB/s | 121.4 KiB | 00m00s [560/576] Installing libnccl-devel-0:2. 100% | 755.9 KiB/s | 46.1 KiB | 00m00s >>> Running post-install scriptlet: libnccl-devel-0:2.21.5-1+cuda12.4.aarch64 >>> Stop post-install scriptlet: libnccl-devel-0:2.21.5-1+cuda12.4.aarch64 [561/576] Installing libcurand-devel-12 100% | 57.4 MiB/s | 93.8 MiB | 00m02s [562/576] Installing onnx-devel-0:1.17. 100% | 105.2 MiB/s | 1.1 MiB | 00m00s [563/576] Installing cpuinfo-devel-1:0- 100% | 40.1 MiB/s | 82.1 KiB | 00m00s [564/576] Installing cuda-nvrtc-devel-1 100% | 70.1 MiB/s | 72.8 MiB | 00m01s [565/576] Installing pthreadpool-devel- 100% | 99.1 MiB/s | 101.5 KiB | 00m00s [566/576] Installing libcudnn8-devel-0: 100% | 17.8 MiB/s | 200.7 KiB | 00m00s >>> Running post-install scriptlet: libcudnn8-devel-0:8.9.7.29-2.cuda12.3.aarch6 >>> Stop post-install scriptlet: libcudnn8-devel-0:8.9.7.29-2.cuda12.3.aarch64 [567/576] Installing snappy-devel-0:1.1 100% | 7.7 MiB/s | 47.4 KiB | 00m00s [568/576] Installing eigen3-devel-0:3.4 100% | 151.2 MiB/s | 8.5 MiB | 00m00s [569/576] Installing neon2sse-devel-0:0 100% | 261.6 MiB/s | 803.5 KiB | 00m00s [570/576] Installing systemd-rpm-macros 100% | 9.8 MiB/s | 10.0 KiB | 00m00s [571/576] Installing psimd-devel-1:0-20 100% | 45.4 MiB/s | 46.4 KiB | 00m00s [572/576] Installing fxdiv-devel-1:0-20 100% | 17.3 MiB/s | 17.7 KiB | 00m00s [573/576] Installing cuda-profiler-api- 100% | 71.1 MiB/s | 72.8 KiB | 00m00s [574/576] Installing cuda-nvml-devel-12 100% | 93.2 MiB/s | 667.7 KiB | 00m00s [575/576] Installing libzstd-devel-0:1. 100% | 64.7 MiB/s | 198.9 KiB | 00m00s [576/576] Installing gemmlowp-devel-0:0 100% | 1.4 MiB/s | 2.3 MiB | 00m02s >>> Running post-transaction scriptlet: cuda-toolkit-12-3-config-common-0:12.3.1 >>> Stop post-transaction scriptlet: cuda-toolkit-12-3-config-common-0:12.3.101- >>> Running post-transaction scriptlet: urw-base35-bookman-fonts-0:20200910-19.f >>> Stop post-transaction scriptlet: urw-base35-bookman-fonts-0:20200910-19.fc40 >>> Running post-transaction scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40 >>> Stop post-transaction scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40.no >>> Running post-transaction scriptlet: urw-base35-d050000l-fonts-0:20200910-19. >>> Stop post-transaction scriptlet: urw-base35-d050000l-fonts-0:20200910-19.fc4 >>> Running post-transaction scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc >>> Stop post-transaction scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc40. >>> Running post-transaction scriptlet: urw-base35-nimbus-mono-ps-fonts-0:202009 >>> Stop post-transaction scriptlet: urw-base35-nimbus-mono-ps-fonts-0:20200910- >>> Running post-transaction scriptlet: urw-base35-nimbus-roman-fonts-0:20200910 >>> Stop post-transaction scriptlet: urw-base35-nimbus-roman-fonts-0:20200910-19 >>> Running post-transaction scriptlet: urw-base35-nimbus-sans-fonts-0:20200910- >>> Stop post-transaction scriptlet: urw-base35-nimbus-sans-fonts-0:20200910-19. >>> Running post-transaction scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40 >>> Stop post-transaction scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40.no >>> Running post-transaction scriptlet: urw-base35-standard-symbols-ps-fonts-0:2 >>> Stop post-transaction scriptlet: urw-base35-standard-symbols-ps-fonts-0:2020 >>> Running post-transaction scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40 >>> Stop post-transaction scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40.no >>> Running post-transaction scriptlet: fontconfig-0:2.15.0-4.fc40.aarch64 >>> Stop post-transaction scriptlet: fontconfig-0:2.15.0-4.fc40.aarch64 >>> Running post-transaction scriptlet: crypto-policies-scripts-0:20240201-2.git >>> Stop post-transaction scriptlet: crypto-policies-scripts-0:20240201-2.git9f5 >>> Running post-transaction scriptlet: nss-0:3.98.0-1.fc40.aarch64 >>> Stop post-transaction scriptlet: nss-0:3.98.0-1.fc40.aarch64 >>> Running trigger-install scriptlet: glibc-common-0:2.39.9999-99.fc40.aarch64 >>> Stop trigger-install scriptlet: glibc-common-0:2.39.9999-99.fc40.aarch64 >>> Running trigger-install scriptlet: info-0:7.1-2.fc40.aarch64 >>> Stop trigger-install scriptlet: info-0:7.1-2.fc40.aarch64 >>> Running trigger-install scriptlet: glib2-0:2.80.0-1.fc40.aarch64 >>> Stop trigger-install scriptlet: glib2-0:2.80.0-1.fc40.aarch64 >>> Running trigger-install scriptlet: shared-mime-info-0:2.3-4.fc40.aarch64 >>> Stop trigger-install scriptlet: shared-mime-info-0:2.3-4.fc40.aarch64 >>> Running trigger-install scriptlet: gdk-pixbuf2-0:2.42.10-8.fc40.aarch64 >>> Stop trigger-install scriptlet: gdk-pixbuf2-0:2.42.10-8.fc40.aarch64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc40.aarch64 >>> Running trigger-install scriptlet: fontconfig-0:2.15.0-4.fc40.aarch64 >>> Stop trigger-install scriptlet: fontconfig-0:2.15.0-4.fc40.aarch64 >>> Running trigger-install scriptlet: graphviz-0:9.0.0-11.fc40.aarch64 >>> Stop trigger-install scriptlet: graphviz-0:9.0.0-11.fc40.aarch64 Warning: skipped PGP checks for 83 package(s). Finish: build setup for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm Start: rpmbuild pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: aarch64 Building for target aarch64 setting SOURCE_DATE_EPOCH=1554595200 Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.Dxcaz0 + umask 022 + cd /builddir/build/BUILD + cd /builddir/build/BUILD + rm -rf pytorch + /usr/bin/mkdir -p pytorch + cd pytorch + rm -rf /builddir/build/BUILD/pytorch-SPECPARTS + /usr/bin/mkdir -p /builddir/build/BUILD/pytorch-SPECPARTS + /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w . + git clone --depth 1 -n -b main https://github.com/pytorch/pytorch.git . Cloning into '.'... + git fetch --depth 1 origin 7efaf54dc46034189cb36b345764a5a9a5b693d4 From https://github.com/pytorch/pytorch * branch 7efaf54dc46034189cb36b345764a5a9a5b693d4 -> FETCH_HEAD + git reset --hard 7efaf54dc46034189cb36b345764a5a9a5b693d4 HEAD is now at 7efaf54 Fakeifying views shouldnt create symbols when dynamic=False (#123348) + git submodule update --init --depth 1 third_party/fmt Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' Cloning into '/builddir/build/BUILD/pytorch/third_party/fmt'... From https://github.com/fmtlib/fmt * branch e69e5f977d458f2650bb346dadf2ad30c5320281 -> FETCH_HEAD Submodule path 'third_party/fmt': checked out 'e69e5f977d458f2650bb346dadf2ad30c5320281' + git submodule update --init --depth 1 third_party/XNNPACK Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' Cloning into '/builddir/build/BUILD/pytorch/third_party/XNNPACK'... From https://github.com/google/XNNPACK * branch fcbf55af6cf28a4627bcd1f703ab7ad843f0f3a2 -> FETCH_HEAD Submodule path 'third_party/XNNPACK': checked out 'fcbf55af6cf28a4627bcd1f703ab7ad843f0f3a2' + git submodule update --init --depth 1 third_party/ittapi Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' Cloning into '/builddir/build/BUILD/pytorch/third_party/ittapi'... From https://github.com/intel/ittapi * branch 5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42 -> FETCH_HEAD Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42' + git submodule update --init --depth 1 third_party/pocketfft Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' Cloning into '/builddir/build/BUILD/pytorch/third_party/pocketfft'... From https://github.com/mreineck/pocketfft * branch 9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe -> FETCH_HEAD Submodule path 'third_party/pocketfft': checked out '9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe' + git submodule update --init --depth 1 third_party/cudnn_frontend Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' Cloning into '/builddir/build/BUILD/pytorch/third_party/cudnn_frontend'... From https://github.com/NVIDIA/cudnn-frontend * branch 150798fe976556078f443fdb059a1ff0361f58a2 -> FETCH_HEAD Submodule path 'third_party/cudnn_frontend': checked out '150798fe976556078f443fdb059a1ff0361f58a2' + git --no-pager log --format=fuller commit 7efaf54dc46034189cb36b345764a5a9a5b693d4 Author: Brian Hirsh AuthorDate: Thu Apr 11 08:19:28 2024 -0700 Commit: PyTorch MergeBot CommitDate: Fri Apr 12 01:12:23 2024 +0000 Fakeifying views shouldnt create symbols when dynamic=False (#123348) Fixes https://github.com/pytorch/pytorch/issues/123298 I was also seeing some crashes in torchtrain due to dynamic shapes, even when I set `compile(dynamic=False)` (cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @wanchaol). This doesn't fix the underlying dynamic shape issues with compile + DTensor, but it does prevent dynamic shapes from leaking in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123348 Approved by: https://github.com/ezyang ghstack dependencies: #122502, #122751 Patch #1 (pytorch-C.patch): + echo 'Patch #1 (pytorch-C.patch):' + /usr/bin/patch --no-backup-if-mismatch -f -p0 -b --suffix .python~ --fuzz=100 patching file torch/CMakeLists.txt Hunk #1 succeeded at 277 (offset -2 lines). Patch #5 (pytorch-cuda12.patch): + echo 'Patch #5 (pytorch-cuda12.patch):' + /usr/bin/patch --no-backup-if-mismatch -f -p1 -b --suffix .cu12~ --fuzz=100 patching file aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu patching file aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu patching file aten/src/ATen/native/transformers/cuda/attention.cu Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/attention_backward.cu Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_backward.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_forward.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/flash_attn/flash_bwd_launch_template.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/flash_attn/flash_fwd_launch_template.h Hunk #1 succeeded at 1 with fuzz 3. + sed -i -e 's|VERSION_LESS 3.7)|VERSION_LESS 3.6)|g' cmake/Dependencies.cmake + sed -i -e 's|PY_MAJOR_VERSION == 3|PY_MAJOR_VERSION == 3 \&\& PY_MINOR_VERSION > 6|' torch/csrc/dynamo/eval_frame.c + sed -i 's|CMAKE_CXX_STANDARD 14|CMAKE_CXX_STANDARD 17|' CMakeLists.txt + sed -i -e 's|torch_cpu PUBLIC c10|torch_cpu PUBLIC c10 qnnpack gloo gloo_cuda|' caffe2/CMakeLists.txt + sed -i -e 's|USE_SYSTEM_BIND11|USE_SYSTEM_PYBIND11|g' cmake/Dependencies.cmake + rm -rf 'third_party/pthreadpool/*' + touch third_party/pthreadpool/CMakeLists.txt + sed -i -e 's|NAMES openblas|NAMES openblaso openblas|' cmake/Modules/FindOpenBLAS.cmake + sed -i -e 's|USE_ZSTD|NOT_USE_ZSTD|g' cmake/Dependencies.cmake + sed -i -e 's|add_subdirectory(zstd)|list(APPEND Caffe2_PUBLIC_DEPENDENCY_LIBS zstd)|g' caffe2/share/contrib/CMakeLists.txt + sed -i -e 's|Caffe2_DEPENDENCY_LIBS onnx_proto onnx|Caffe2_DEPENDENCY_LIBS onnx_proto onnx onnx_optimizer|' cmake/Dependencies.cmake + mkdir -p third_party/tensorpipe + echo '' + sed -i '/add_dependencies(tensorpipe_agent tensorpipe)/d' caffe2/CMakeLists.txt + echo '' + echo 'set(NNPACK_FOUND TRUE)' + sed -i '/TARGET cpuinfo PROPERTY/d' cmake/Dependencies.cmake + sed -i '/APPEND Caffe2_DEPENDENCY_LIBS fp16/d' cmake/Dependencies.cmake + mkdir -p third_party/QNNPACK + echo '' + sed -i '/TARGET qnnpack PROPERTY/d' cmake/Dependencies.cmake + sed -i -e '/target_compile_options(qnnpack/d' cmake/Dependencies.cmake + mkdir -p third_party/psimd + echo '' + sed -i '/pytorch_qnnpack PRIVATE psimd/d' aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt + sed -i '/NOT TARGET fxdiv/,/endif/d' caffe2/CMakeLists.txt + sed -i '/torch_cpu PRIVATE fxdiv/d' caffe2/CMakeLists.txt + sed -i '/pytorch_qnnpack PRIVATE fxdiv/d' aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt + mkdir -p third_party/fbgemm + echo '' + sed -i '/(TARGET fbgemm/d' cmake/Dependencies.cmake + sed -i 's|caffe2_fakelowp_ops fbgemm cpuinfo|caffe2_fakelowp_ops|' caffe2/contrib/fakelowp/CMakeLists.txt + sed -i 's|caffe2_dnnlowp_avx2_ops fbgemm|caffe2_dnnlowp_avx2_ops|' caffe2/quantization/server/CMakeLists.txt + mkdir -p third_party/foxi + echo '' + sed -i '/if(NOT TARGET kineto)/,/endif()/d' cmake/Dependencies.cmake + sed -i 's|libkineto/include|libkineto/include\n/usr/include/kineto|' torch/CMakeLists.txt + sed -i 's|libkineto/include|libkineto/include\n/usr/include/kineto|' caffe2/CMakeLists.txt + mkdir -p third_party/onnx-tensorrt + echo '' + sed -i /nvonnxparser_static/d cmake/Dependencies.cmake + sed -i 's|onnx_trt_library|nvonnxparser_static|g' cmake/Dependencies.cmake + rm -rf torch/csrc/jit/serialization/mobile_bytecode_generated.h + flatc --cpp --gen-mutable --scoped-enums -o torch/csrc/jit/serialization -c torch/csrc/jit/serialization/mobile_bytecode.fbs + echo '// @generated' + sed -i '/find_package(RocksDB CONFIG)/d' modules/rocksdb/CMakeLists.txt + sed -i 's|RocksDB::rocksdb|RocksDB::rocksdb-shared|' modules/rocksdb/CMakeLists.txt + mv -f cmake/Modules_CUDA_fix/FindCUDNN.cmake cmake/Modules + rm -rf cmake/Modules_CUDA_fix + find . -type d -name FindCUDA -exec rm -rf '{}' ';' + sed -i -e '/install/{:a;/COMPONENT/bb;N;ba;:b;/Modules_CUDA_fix/d;}' CMakeLists.txt + sed -i -e 's|CMAKE_CUDA_FLAGS "-D|CMAKE_CUDA_FLAGS " -D|' CMakeLists.txt + sed -i '/install(EXPORT Caffe2Targets/,/dev)/d' CMakeLists.txt + sed -i 's|SYSTEM ||g' c10/CMakeLists.txt + sed -i 's|SYSTEM ||g' torch/CMakeLists.txt + sed -i 's|SYSTEM ||g' caffe2/CMakeLists.txt + sed -i 's|BEFORE SYSTEM ||g' cmake/ProtoBuf.cmake + sed -i 's|AFTER SYSTEM ||g' cmake/Dependencies.cmake + sed -i 's|BEFORE SYSTEM ||g' cmake/Dependencies.cmake + sed -i 's|SYSTEM ||g' cmake/Dependencies.cmake + sed -i '1i #include ' c10/util/Registry.h + sed -i '1i #include ' c10/core/DispatchKey.h + sed -i '1i #include ' torch/csrc/jit/runtime/logging.cpp + sed -i '1i #include ' torch/csrc/lazy/core/multi_wait.cpp + sed -i '1i #include "stdint.h"' torch/csrc/jit/passes/quantization/quantization_type.h + RPM_EC=0 ++ jobs -p + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.IiY90c + umask 022 + cd /builddir/build/BUILD + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -Wl,-lstdc++' + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + cd pytorch + mkdir build + pushd build ~/build/BUILD/pytorch/build ~/build/BUILD/pytorch + export ONNX_ML=0 + ONNX_ML=0 + export BUILD_SPLIT_CUDA=ON + BUILD_SPLIT_CUDA=ON + export REL_WITH_DEB_INFO=1 + REL_WITH_DEB_INFO=1 + export TORCH_NVCC_FLAGS=-DCUDA_HAS_FP16 + TORCH_NVCC_FLAGS=-DCUDA_HAS_FP16 + export PYTHON_EXECUTABLE=/usr/bin/python3 + PYTHON_EXECUTABLE=/usr/bin/python3 + export LDFLAGS=-Wl,-lstdc++ + LDFLAGS=-Wl,-lstdc++ + export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64/ + LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64/ + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS=-Wl,-lstdc++ + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + /usr/bin/cmake -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON .. -Wno-dev -DCMAKE_SKIP_RPATH=ON -DCMAKE_VERBOSE_MAKEFILE=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_NO_SYSTEM_FROM_IMPORTED=ON -DCMAKE_SKIP_RULE_DEPENDENCY=ON -DCMAKE_SUPPRESS_REGENERATION=ON -DUSE_CCACHE=OFF -DHAVE_SOVERSION=ON -DUSE_NATIVE_ARCH=OFF -DUSE_DISTRIBUTED=ON -DBUILD_DOCS=OFF -DBUILD_PYTHON=ON -DBUILD_FUNCTORCH=ON -DBUILD_CAFFE2=OFF -DBUILD_BINARY=OFF -DBUILD_BENCHMARK=OFF -DBUILD_CUSTOM_PROTOBUF=OFF -DBUILDING_WITH_TORCH_LIBS=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYBIND11_PYTHON_VERSION=3.12 -DCAFFE2_LINK_LOCAL_PROTOBUF=OFF -DONNX_ML=OFF -DUSE_GLOG=ON -DUSE_GFLAGS=ON -DUSE_OPENMP=ON -DUSE_KINETO=ON -DUSE_BREAKPAD=OFF -DUSE_SYSTEM_ONNX=ON -DUSE_SYSTEM_GLOO=ON -DUSE_SYSTEM_PYBIND11=ON -DUSE_SYSTEM_EIGEN_INSTALL=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_NVRTC=ON -DUSE_CUPTI_SO=ON -DUSE_FAST_NVCC=ON -DUSE_SYSTEM_NCCL=ON -DCMAKE_CUDA_FLAGS=-fPIC -DCUDA_PROPAGATE_HOST_FLAGS=OFF '-DTORCH_CUDA_ARCH_LIST=5.2+PTX 6.1 7.5 8.6 8.9 9.0' -DCUDA_HOST_COMPILER=/usr/bin/cuda-g++ -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/cuda-g++ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.3 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.3/bin/nvcc '-DCUDA_NVCC_FLAGS=--compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all' '-DCMAKE_CUDA_FLAGS=--compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all' -DNCCL_INCLUDE_DIR=/usr/include/nccl -DUSE_MAGMA=ON -DBUILD_SPLIT_CUDA=ON -DUSE_TENSORRT=OFF -DBLAS=OpenBLAS -DUSE_MPI=OFF -DUSE_OBSERVERS=OFF -DUSE_ASAN=OFF -DUSE_ROCM=OFF -DUSE_MKLDNN=OFF -DUSE_FBGEMM=OFF -DUSE_NNPACK=ON -DUSE_QNNPACK=ON -DUSE_PYTORCH_QNNPACK=ON -DUSE_SYSTEM_FP16=ON -DUSE_SYSTEM_PSIMD=ON -DUSE_SYSTEM_SLEEF=ON -DUSE_SYSTEM_FXDIV=ON -DUSE_SYSTEM_XNNPACK=OFF -DUSE_SYSTEM_CPUINFO=ON -DUSE_SYSTEM_PTHREADPOOL=ON -DUSE_TENSORPIPE=ON -DUSE_FAKELOWP=OFF -DUSE_OPENCL=OFF -DUSE_GLOO=ON -DUSE_ZMQ=ON -DUSE_ZSTD=ON -DUSE_LMDB=ON -DUSE_REDIS=ON -DUSE_LEVELDB=ON -DUSE_ROCKSDB=ON -DUSE_FFMPEG=OFF -DUSE_OPENCV=ON -DUSE_METAL=OFF -DUSE_TBB=OFF -DUSE_LLVM=OFF -DATEN_NO_TEST=ON -- The CXX compiler identification is GNU 14.0.1 -- The C compiler identification is GNU 14.0.1 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- /usr/bin/g++ /builddir/build/BUILD/pytorch/torch/abi-check.cpp -o /builddir/build/BUILD/pytorch/build/abi-check -- Determined _GLIBCXX_USE_CXX11_ABI=1 -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed -- Turning off deprecation warning due to glog. -- Performing Test C_HAS_AVX_1 -- Performing Test C_HAS_AVX_1 - Failed -- Performing Test C_HAS_AVX_2 -- Performing Test C_HAS_AVX_2 - Failed -- Performing Test C_HAS_AVX_3 -- Performing Test C_HAS_AVX_3 - Failed -- Performing Test C_HAS_AVX2_1 -- Performing Test C_HAS_AVX2_1 - Failed -- Performing Test C_HAS_AVX2_2 -- Performing Test C_HAS_AVX2_2 - Failed -- Performing Test C_HAS_AVX2_3 -- Performing Test C_HAS_AVX2_3 - Failed -- Performing Test C_HAS_AVX512_1 -- Performing Test C_HAS_AVX512_1 - Failed -- Performing Test C_HAS_AVX512_2 -- Performing Test C_HAS_AVX512_2 - Failed -- Performing Test C_HAS_AVX512_3 -- Performing Test C_HAS_AVX512_3 - Failed -- Performing Test CXX_HAS_AVX_1 -- Performing Test CXX_HAS_AVX_1 - Failed -- Performing Test CXX_HAS_AVX_2 -- Performing Test CXX_HAS_AVX_2 - Failed -- Performing Test CXX_HAS_AVX_3 -- Performing Test CXX_HAS_AVX_3 - Failed -- Performing Test CXX_HAS_AVX2_1 -- Performing Test CXX_HAS_AVX2_1 - Failed -- Performing Test CXX_HAS_AVX2_2 -- Performing Test CXX_HAS_AVX2_2 - Failed -- Performing Test CXX_HAS_AVX2_3 -- Performing Test CXX_HAS_AVX2_3 - Failed -- Performing Test CXX_HAS_AVX512_1 -- Performing Test CXX_HAS_AVX512_1 - Failed -- Performing Test CXX_HAS_AVX512_2 -- Performing Test CXX_HAS_AVX512_2 - Failed -- Performing Test CXX_HAS_AVX512_3 -- Performing Test CXX_HAS_AVX512_3 - Failed -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Failed -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Success -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Success -- Performing Test COMPILER_SUPPORTS_RDYNAMIC -- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success -- Found CUDA: /usr/local/cuda-12.3 (found version "12.3") -- The CUDA compiler identification is NVIDIA 12.3.107 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda-12.3/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda-12.3/include (found version "12.3.107") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Caffe2: CUDA detected: 12.3 -- Caffe2: CUDA nvcc is: /usr/local/cuda-12.3/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda-12.3 -- Caffe2: Header version is: 12.3 -- /usr/local/cuda-12.3/lib64/libnvrtc.so shorthash is 543806da -- Found CUDNN: /usr/lib64/libcudnn.so -- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH) CMake Warning at cmake/public/cuda.cmake:275 (message): Cannot find cuSPARSELt library. Turning the option off Call Stack (most recent call first): cmake/Dependencies.cmake:44 (include) CMakeLists.txt:760 (include) -- Added CUDA NVCC flags for: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_52,code=compute_52 -- Caffe2: Found protobuf with new-style protobuf targets. -- Caffe2 protobuf include directory: /usr/include -- Trying to find preferred BLAS backend of choice: OpenBLAS -- Found OpenBLAS libraries: /usr/lib64/libopenblaso.so -- Found OpenBLAS include: /usr/include/openblas -- Using pocketfft in directory: /builddir/build/BUILD/pytorch/third_party/pocketfft/ -- Found pthreadpool: /usr/lib64/libpthreadpool.so Found cpuinfo: /usr/lib64/libcpuinfo.so -- The ASM compiler identification is GNU -- Found assembler: /usr/bin/gcc -- Caffe2: Found gflags with new-style gflags target. -- Caffe2: Cannot find glog automatically. Using legacy find. -- Found glog: /usr/include -- Caffe2: Found glog (include: /usr/include, library: /usr/lib64/libglog.so) CMake Warning at cmake/Dependencies.cmake:848 (message): Turning USE_FAKELOWP off as it depends on USE_FBGEMM. Call Stack (most recent call first): CMakeLists.txt:760 (include) -- Found LMDB: /usr/include -- Found lmdb (include: /usr/include, library: /usr/lib64/liblmdb.so) -- Found LevelDB: /usr/include -- Found LevelDB (include: /usr/include, library: /usr/lib64/libleveldb.so) -- Found Snappy: /usr/include -- Found Snappy (include: /usr/include, library: /usr/lib64/libsnappy.so) -- Found Numa: /usr/include -- Found Numa (include: /usr/include, library: /usr/lib64/libnuma.so) -- Found ZMQ: /usr/include -- Found ZMQ (include: /usr/include, library: /usr/lib64/libzmq.so) -- Found Hiredis: /usr/include -- Found Hiredis (include: /usr/include, library: /usr/lib64/libhiredis.so) -- OpenCV found (/usr/lib64/cmake/opencv4) -- Found system Eigen at /usr/include/eigen3 -- Setting Python's include dir to /usr/include/python3.12 from sysconfig -- Setting Python's library to /usr/lib64/python3.12 -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.12.2", minimum required is "3.0") -- Found PythonLibs: /usr/lib64/python3.12 (found suitable version "3.12.2", minimum required is "3.0") -- Found NumPy: /usr/lib64/python3.12/site-packages/numpy/core/include (found version "1.26.4") -- NumPy ver. 1.26.4 found (include: /usr/lib64/python3.12/site-packages/numpy/core/include) -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.12.2", minimum required is "3.12") -- Found PythonLibs: /usr/lib64/python3.12 -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Found pybind11: /usr/include (found version "2.11.1") -- pybind11 include dirs: /usr/include;/usr/include/python3.12 -- Check OMP with lib /usr/lib/gcc/aarch64-redhat-linux/14/libgomp.so and flags -fopenmp -v -- Check OMP with lib /usr/lib/gcc/aarch64-redhat-linux/14/libgomp.so and flags -fopenmp -v -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Adding OpenMP CXX_FLAGS: -fopenmp -- Will link against OpenMP libraries: /usr/lib/gcc/aarch64-redhat-linux/14/libgomp.so -- Found NCCL: /usr/include -- Determining NCCL version from /usr/include/nccl.h... -- Looking for NCCL_VERSION_CODE -- Looking for NCCL_VERSION_CODE - not found -- NCCL version < 2.3.5-5 -- Found NCCL (include: /usr/include, library: /usr/lib64/libnccl.so) -- Found CUB: /usr/local/cuda-12.3/include -- Converting CMAKE_CUDA_FLAGS to CUDA_NVCC_FLAGS: CUDA_NVCC_FLAGS = --compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all;-DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS;-D_GLIBCXX_USE_CXX11_ABI=1;-Xfatbin;-compress-all;--compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all;-DONNX_NAMESPACE=onnx;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_52,code=compute_52;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl;--expt-relaxed-constexpr;--expt-extended-lambda CUDA_NVCC_FLAGS_DEBUG = -g CUDA_NVCC_FLAGS_RELEASE = -O3;-DNDEBUG CUDA_NVCC_FLAGS_RELWITHDEBINFO = -O2;-g;-DNDEBUG CUDA_NVCC_FLAGS_MINSIZEREL = -O1;-DNDEBUG Found gloo: /usr/lib64/libgloo.so -- Found onnx: /usr/lib64/libonnx.so /usr/lib64/libonnx_proto.so -- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor -- Adding -DNDEBUG to compile flags -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - False -- Compiling with MAGMA support -- MAGMA INCLUDE DIRECTORIES: /usr/include -- MAGMA LIBRARIES: /usr/lib64/libmagma.so -- MAGMA V2 check: 0 -- Could not find hardware support for NEON on this machine. -- No OMAP3 processor on this machine. -- No OMAP4 processor on this machine. -- asimd/Neon found with compiler flag : -D__NEON__ -- Looking for cheev_ -- Looking for cheev_ - found -- Looking for cgesdd_ -- Looking for cgesdd_ - found -- Found a library with LAPACK API (open). -- MIOpen not found. Compiling without MIOpen support disabling ROCM because NOT USE_ROCM is set disabling MKLDNN because USE_MKLDNN is not set -- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - found -- Looking for mmap -- Looking for mmap - found -- Looking for shm_open -- Looking for shm_open - found -- Looking for shm_unlink -- Looking for shm_unlink - found -- Looking for malloc_usable_size -- Looking for malloc_usable_size - found -- -- check z16 -- Performing Test COMPILE_OUT_z16 -- Performing Test COMPILE_OUT_z16 - Failed -- Performing Test COMPILE_OUT_z15 -- check z15 -- Performing Test COMPILE_OUT_z15 - Failed -- Performing Test COMPILE_OUT_z14 -- check z14 -- Performing Test COMPILE_OUT_z14 - Failed -- -- Version: 10.2.1 -- Build type: Release -- Using Kineto with CUPTI support -- Configuring Kineto dependency: -- KINETO_SOURCE_DIR = /builddir/build/BUILD/pytorch/third_party/kineto/libkineto -- KINETO_BUILD_TESTS = OFF -- KINETO_LIBRARY_TYPE = static -- CUDA_SOURCE_DIR = /usr/local/cuda-12.3 -- CUDA_INCLUDE_DIRS = /usr/local/cuda-12.3/include -- CUPTI_INCLUDE_DIR = /usr/local/cuda-12.3/include -- CUDA_cupti_LIBRARY = /usr/local/cuda-12.3/lib64/libcupti.so -- Found CUPTI -- Configured Kineto -- GCC 14.0.1: Adding gcc and gcc_s libs to link line -- Performing Test HAS_WERROR_RETURN_TYPE -- Performing Test HAS_WERROR_RETURN_TYPE - Success -- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR -- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR - Success -- Performing Test HAS_WERROR_BRACED_SCALAR_INIT -- Performing Test HAS_WERROR_BRACED_SCALAR_INIT - Failed -- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT -- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT - Success -- Performing Test HAS_WERROR_BOOL_OPERATION -- Performing Test HAS_WERROR_BOOL_OPERATION - Success -- Performing Test HAS_WNARROWING -- Performing Test HAS_WNARROWING - Success -- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS -- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS - Success -- Performing Test HAS_WNO_TYPE_LIMITS -- Performing Test HAS_WNO_TYPE_LIMITS - Success -- Performing Test HAS_WNO_ARRAY_BOUNDS -- Performing Test HAS_WNO_ARRAY_BOUNDS - Success -- Performing Test HAS_WNO_UNKNOWN_PRAGMAS -- Performing Test HAS_WNO_UNKNOWN_PRAGMAS - Success -- Performing Test HAS_WNO_UNUSED_PARAMETER -- Performing Test HAS_WNO_UNUSED_PARAMETER - Success -- Performing Test HAS_WNO_UNUSED_FUNCTION -- Performing Test HAS_WNO_UNUSED_FUNCTION - Success -- Performing Test HAS_WNO_UNUSED_RESULT -- Performing Test HAS_WNO_UNUSED_RESULT - Success -- Performing Test HAS_WNO_STRICT_OVERFLOW -- Performing Test HAS_WNO_STRICT_OVERFLOW - Success -- Performing Test HAS_WNO_STRICT_ALIASING -- Performing Test HAS_WNO_STRICT_ALIASING - Success -- Performing Test HAS_WNO_STRINGOP_OVERFLOW -- Performing Test HAS_WNO_STRINGOP_OVERFLOW - Success -- Performing Test HAS_WVLA_EXTENSION -- Performing Test HAS_WVLA_EXTENSION - Failed -- Performing Test HAS_WSUGGEST_OVERRIDE -- Performing Test HAS_WSUGGEST_OVERRIDE - Success -- Performing Test HAS_WNEWLINE_EOF -- Performing Test HAS_WNEWLINE_EOF - Failed -- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE -- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE - Failed -- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE -- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed -- Performing Test HAS_WNO_ERROR_PEDANTIC -- Performing Test HAS_WNO_ERROR_PEDANTIC - Success -- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST -- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST - Success -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE - Failed -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed -- Performing Test HAS_WCONSTANT_CONVERSION -- Performing Test HAS_WCONSTANT_CONVERSION - Failed -- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION -- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION - Failed -- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE -- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE - Failed -- Performing Test HAS_WNO_MISSING_BRACES -- Performing Test HAS_WNO_MISSING_BRACES - Success -- Performing Test HAS_QUNUSED_ARGUMENTS -- Performing Test HAS_QUNUSED_ARGUMENTS - Failed -- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS -- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS - Success -- Performing Test HAS_FALIGNED_NEW -- Performing Test HAS_FALIGNED_NEW - Success -- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE -- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE - Success -- Performing Test HAS_WNO_MAYBE_UNINITIALIZED -- Performing Test HAS_WNO_MAYBE_UNINITIALIZED - Success -- Performing Test HAS_FSTANDALONE_DEBUG -- Performing Test HAS_FSTANDALONE_DEBUG - Failed -- Performing Test HAS_FNO_MATH_ERRNO -- Performing Test HAS_FNO_MATH_ERRNO - Success -- Performing Test HAS_FNO_TRAPPING_MATH -- Performing Test HAS_FNO_TRAPPING_MATH - Success -- Performing Test HAS_WERROR_FORMAT -- Performing Test HAS_WERROR_FORMAT - Success -- Performing Test HAS_VST1 -- Performing Test HAS_VST1 - Success -- Performing Test HAS_VLD1 -- Performing Test HAS_VLD1 - Success -- Performing Test HAS_WDEPRECATED -- Performing Test HAS_WDEPRECATED - Success -- NUMA paths: -- /usr/include -- /usr/lib64/libnuma.so -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include -- headers outputs: -- sources outputs: -- declarations_yaml outputs: -- Using ATen parallel backend: OMP Found sleef: /usr/lib64/libsleef.so AT_INSTALL_INCLUDE_DIR include/ATen/core core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/TensorBody.h core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/aten_interned_strings.h core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/enum_tag.h disable test because ATEN_NO_TEST is set -- Performing Test HAS_WNO_DEPRECATED_COPY -- Performing Test HAS_WNO_DEPRECATED_COPY - Success -- _GLIBCXX_USE_CXX11_ABI=1 is already defined as a cmake variable -- Using lib/python3.12/site-packages as python relative installation path -- -- ******** Summary ******** -- General: -- CMake version : 3.28.2 -- CMake command : /usr/bin/cmake -- System : Linux -- C++ compiler : /usr/bin/g++ -- C++ compiler id : GNU -- C++ compiler version : 14.0.1 -- Using ccache if found : OFF -- CXX flags : -O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DTMP_LIBKINETO_NANOSECOND -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -- Build type : Release -- Compile definitions : ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;FLASHATTENTION_DISABLE_ALIBI -- CMAKE_PREFIX_PATH : /usr/local/cuda-12.3;/usr/local/cuda-12.3;/usr/local/cuda-12.3 -- CMAKE_INSTALL_PREFIX : /usr -- USE_GOLD_LINKER : OFF -- -- TORCH_VERSION : 2.4.0 -- BUILD_CAFFE2 : OFF -- BUILD_CAFFE2_OPS : OFF -- BUILD_STATIC_RUNTIME_BENCHMARK: OFF -- BUILD_BINARY : OFF -- BUILD_CUSTOM_PROTOBUF : OFF -- Protobuf compiler : /usr/bin/protoc -- Protobuf includes : /usr/include -- Protobuf libraries : /usr/lib64/libprotobuf.so -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 3.12.2 -- Python executable : /usr/bin/python3 -- Pythonlibs version : 3.12.2 -- Python library : /usr/lib64/python3.12 -- Python includes : /usr/include/python3.12 -- Python site-packages: lib/python3.12/site-packages -- BUILD_SHARED_LIBS : ON -- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF -- BUILD_TEST : OFF -- BUILD_JNI : OFF -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- TRACING_BASED : OFF -- USE_BLAS : 1 -- BLAS : open -- BLAS_HAS_SBGEMM : -- USE_LAPACK : 1 -- LAPACK : open -- USE_ASAN : OFF -- USE_TSAN : OFF -- USE_CPP_CODE_COVERAGE : OFF -- USE_CUDA : ON -- Split CUDA : ON -- CUDA static link : OFF -- USE_CUDNN : ON -- USE_EXPERIMENTAL_CUDNN_V8_API: -- USE_CUSPARSELT : OFF -- CUDA version : 12.3 -- USE_FLASH_ATTENTION : ON -- USE_MEM_EFF_ATTENTION : ON -- cuDNN version : 8.9.7 -- CUDA root directory : /usr/local/cuda-12.3 -- CUDA library : /usr/local/cuda-12.3/lib64/stubs/libcuda.so -- cudart library : /usr/local/cuda-12.3/lib64/libcudart.so -- cublas library : /usr/local/cuda-12.3/lib64/libcublas.so -- cufft library : /usr/local/cuda-12.3/lib64/libcufft.so -- curand library : /usr/local/cuda-12.3/lib64/libcurand.so -- cusparse library : /usr/local/cuda-12.3/lib64/libcusparse.so -- cuDNN library : /usr/lib64/libcudnn.so -- nvrtc : /usr/local/cuda-12.3/lib64/libnvrtc.so -- CUDA include path : /usr/local/cuda-12.3/include -- NVCC executable : /usr/local/cuda-12.3/bin/nvcc -- CUDA compiler : /usr/local/cuda-12.3/bin/nvcc -- CUDA flags : --compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all --compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all -DONNX_NAMESPACE=onnx -gencode arch=compute_52,code=sm_52 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_52,code=compute_52 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -DCUDA_HAS_FP16 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -- CUDA host compiler : /usr/bin/cuda-g++ -- CUDA --device-c : OFF -- USE_TENSORRT : OFF -- USE_XPU : OFF -- USE_ROCM : OFF -- BUILD_NVFUSER : -- USE_EIGEN_FOR_BLAS : -- USE_FBGEMM : OFF -- USE_FAKELOWP : OFF -- USE_KINETO : ON -- USE_FFMPEG : OFF -- USE_GFLAGS : ON -- USE_GLOG : ON -- USE_LEVELDB : ON -- LevelDB version : 1.23 -- Snappy version : 1.1.10 -- USE_LITE_PROTO : OFF -- USE_LMDB : ON -- LMDB version : 0.9.32 -- USE_METAL : OFF -- USE_PYTORCH_METAL : OFF -- USE_PYTORCH_METAL_EXPORT : OFF -- USE_MPS : OFF -- USE_MKL : -- USE_MKLDNN : OFF -- USE_UCC : OFF -- USE_ITT : OFF -- USE_NCCL : ON -- USE_SYSTEM_NCCL : ON -- USE_NNPACK : ON -- USE_NUMPY : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENCV : ON -- OpenCV version : 4.9.0 -- USE_OPENMP : ON -- USE_TBB : OFF -- USE_MIMALLOC : OFF -- USE_VULKAN : OFF -- USE_PROF : OFF -- USE_QNNPACK : ON -- USE_PYTORCH_QNNPACK : ON -- USE_XNNPACK : ON -- USE_REDIS : ON -- USE_ROCKSDB : ON -- USE_ZMQ : ON -- USE_DISTRIBUTED : ON -- USE_MPI : OFF -- USE_GLOO : ON -- USE_GLOO_WITH_OPENSSL : OFF -- USE_TENSORPIPE : ON -- Public Dependencies : -- Private Dependencies : Threads::Threads;/usr/lib64/libopenblaso.so;pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;XNNPACK;/usr/lib64/liblmdb.so;/usr/lib64/libleveldb.so;/usr/lib64/libsnappy.so;/usr/lib64/libzmq.so;/usr/lib64/libhiredis.so;opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs;opencv_optflow;opencv_videoio;opencv_video;caffe2::openmp;tensorpipe;gloo;onnx_proto;onnx;onnx_optimizer;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl -- Public CUDA Deps. : caffe2::cuda;caffe2::nvrtc -- Private CUDA Deps. : caffe2::curand;caffe2::cufft;caffe2::cublas;torch::cudnn;__caffe2_nccl;tensorpipe_cuda;gloo_cuda;/usr/local/cuda-12.3/lib64/libcudart.so;CUDA::cusparse;CUDA::cufft;ATEN_CUDA_FILES_GEN_LIB -- USE_COREML_DELEGATE : OFF -- BUILD_LAZY_TS_BACKEND : ON -- USE_ROCM_KERNEL_ASSERT : OFF -- Performing Test HAS_WMISSING_PROTOTYPES -- Performing Test HAS_WMISSING_PROTOTYPES - Success -- Performing Test HAS_WERROR_MISSING_PROTOTYPES -- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Success -- Configuring done (22.4s) CMake Warning at torch/CMakeLists.txt:282 (target_link_libraries): Target "_C" requests linking to directory "/usr/lib64/python3.12". Targets may link only to libraries. CMake is dropping the item. -- Generating done (1.1s) CMake Warning: Manually-specified variables were not used by the project: CMAKE_Fortran_FLAGS_RELEASE CMAKE_INSTALL_DO_STRIP INCLUDE_INSTALL_DIR LIB_INSTALL_DIR LIB_SUFFIX SHARE_INSTALL_PREFIX SYSCONF_INSTALL_DIR USE_BREAKPAD USE_FAST_NVCC -- Build files have been written to: /builddir/build/BUILD/pytorch/build + make -j4 [ 0%] Linking C static library ../../lib/libfxdiv.a [ 0%] Linking C static library ../../lib/libfp16.a [ 0%] Building C object confu-deps/clog/CMakeFiles/clog.dir/src/clog.c.o [ 0%] Linking C static library ../../lib/libpsimd.a [ 0%] Built target psimd [ 0%] Built target fp16 [ 0%] Built target fxdiv [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/normalization.dir/src/normalization.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/datatype-strings.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/microparams-init.dir/src/microparams-init.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/microkernel-type.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/node-type.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/operator-type.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/log.c.o [ 0%] Linking C static library ../../lib/libclog.a [ 0%] Built target logging [ 0%] Built target normalization [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/packing.dir/src/packing.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/allocator.dir/src/allocator.c.o [ 0%] Built target clog [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/memory.dir/src/memory.c.o [ 0%] Built target allocator [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernel-utils.dir/src/microkernel-utils.c.o [ 0%] Built target microkernel-utils [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/mutex.dir/src/mutex.c.o [ 0%] Built target memory [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/post-operation.dir/src/operators/post-operation.c.o [ 0%] Built target mutex [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/operator-utils.dir/src/operator-utils.c.o [ 0%] Built target post-operation [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/operator-run.dir/src/operator-run.c.o [ 0%] Built target operator-utils [ 0%] Building CXX object confu-deps/XNNPACK/CMakeFiles/convolution-test-helpers.dir/test/convolution-test-helpers.cc.o [ 0%] Built target microparams-init [ 0%] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/format.cc.o [ 0%] Built target convolution-test-helpers [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Allocator.cpp.o [ 0%] Built target operator-run [ 0%] Running C++/Python protocol buffer compiler on /builddir/build/BUILD/pytorch/caffe2/proto/torch.proto [ 0%] Running C++/Python protocol buffer compiler on /builddir/build/BUILD/pytorch/caffe2/proto/caffe2.proto [ 0%] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/torch.pb.cc.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/AutogradState.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/CPUAllocator.cpp.o [ 0%] Built target packing [ 0%] Building CXX object caffe2/CMakeFiles/caffe2_nvrtc.dir/__/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/ConstantSymNodeImpl.cpp.o [ 0%] Linking CXX shared library ../lib/libcaffe2_nvrtc.so Warning: Unused direct dependencies: libcuda.so.1 /lib64/libm.so.6 /lib64/libgcc_s.so.1 [ 0%] Built target caffe2_nvrtc [ 0%] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/caffe2.pb.cc.o [ 0%] Generating ATen headers [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/CopyBytes.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DefaultDtype.cpp.o [ 0%] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/os.cc.o [ 0%] Linking CXX static library ../../lib/libfmt.a [ 0%] Built target fmt [ 0%] Generating ATen headers [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Device.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DeviceType.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DispatchKey.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DispatchKeySet.cpp.o [ 0%] Built target Caffe2_PROTO [ 0%] Building C object caffe2/CMakeFiles/torch_global_deps.dir/__/torch/csrc/empty.c.o [ 0%] Linking C shared library ../lib/libtorch_global_deps.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 /usr/local/cuda-12.3/lib64/libnvrtc.so.12 libcuda.so.1 /usr/local/cuda-12.3/lib64/libcudart.so.12 /usr/local/cuda-12.3/lib64/libnvToolsExt.so.1 [ 0%] Built target torch_global_deps [ 0%] Built target python_copy_files [ 0%] Generating /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/Functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ViewFuncs.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/RegisterLazy.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/Functions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/variable_factories.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ViewFuncs.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyIr.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNonNativeIr.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_nn_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_fft_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_linalg_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_nested_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_sparse_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_special_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_return_types.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_enum_tag.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_return_types.h, /builddir/build/BUILD/pytorch/torch/testing/_internal/generated/annotated_fn_args.py, /builddir/build/BUILD/pytorch/torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp [ 0%] Generating ATen sources [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/GeneratorImpl.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/GradMode.cpp.o [ 0%] Generating ATen sources [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/InferenceMode.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/RefcountedDeleter.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SafePyObject.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/Scalar.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/ScalarType.cpp.o [ 1%] Built target generate-torch-sources [ 1%] Generating /builddir/build/BUILD/pytorch/torch/_C/__init__.pyi, /builddir/build/BUILD/pytorch/torch/_C/_VariableFunctions.pyi, /builddir/build/BUILD/pytorch/torch/nn/functional.pyi [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/Storage.cpp.o [ 1%] Generating ATen declarations_yaml [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/StorageImpl.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/Stream.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymBool.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymFloat.cpp.o [ 1%] Generating /builddir/build/BUILD/pytorch/torch/utils/data/datapipes/datapipe.pyi [ 1%] Built target torch_python_stubs [ 1%] Generating /builddir/build/BUILD/pytorch/torch/version.py [ 1%] Built target gen_torch_version [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/init.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/add.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/average-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/channel-shuffle.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymInt.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/clamp.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/conv-prepack.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/convolution.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/deconvolution.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-prepack.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fully-connected.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fully-connected-sparse.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/global-average-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/hardsigmoid.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/hardswish.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/leaky-relu.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/max-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sigmoid.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/softargmax.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/tanh.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/operator-delete.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/conv-run.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/deconv-run.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-run.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-unpack.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-dynamic-run.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/indirection.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymIntArrayRef.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/operator-run.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8lut32norm/scalar.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8lut/scalar.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sgemm/6x8-psimd.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymNodeImpl.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/mp8x9p8q-neon.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/up8x9-neon.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymbolicShapeMeta.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/up8xm-neon.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8conv/4x8-neon.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8conv/8x8-neon.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x25-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x25-neon-per-channel.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x27-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/up8x9-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/core/TensorImpl.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/up8x9-neon-per-channel.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/mp8x7p7q-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/up8x7-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/up8xm-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x-sumrows-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x8-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x8-dq-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x8c2-xzp-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/6x4-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8vadd/neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sgemm/5x8-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sgemm/6x8-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8clamp/neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/core/TensorOptions.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8maxpool/16x9p8q-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8maxpool/sub16-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8rmax/neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x2-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x3-neon.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/core/UndefinedTensorImpl.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x4-neon.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/xm-neon.c.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8conv/8x8-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x4-packA-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x8c1x4-dq-packedA-aarch64-neon.S.o [ 2%] Building ASM object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x8c8x1-dq-packedA-aarch64-neon.S.o [ 2%] Linking CXX static library ../../lib/libpytorch_qnnpack.a [ 2%] Built target pytorch_qnnpack [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples1-scalar.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples4-scalar.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x2.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/core/WrapDimMinimal.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u2-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u3-acc3.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/COW.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u4-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u4-acc4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u2-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u3-acc3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u4-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u4-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u2-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u3-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u4-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u4-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-4x-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9p8x-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9x-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9p8x-minmax-scalar-c1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/COWDeleter.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9x-minmax-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/f32-conv-hwc-3x3s2p0p1c3x4-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/f32-conv-hwc-3x3s2p1c3x4-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/DeviceGuardImplInterface.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-2x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-3x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-4x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-5x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-6x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc2.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/GPUTrace.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-2x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-3x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-4x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc4.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/HermeticPyObjectTLS.cpp.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/LocalDispatchKeySet.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc5.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-3x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-3x1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PyInterpreter.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc5.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-3x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-3x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-minmax-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-minmax-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-scalar-acc2.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PyObjectSlot.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-minmax-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-minmax-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3f3m3l1c1s1r-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3f3m3l1c1s1r-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-scalar-acc2.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-minmax-scalar.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/SizesAndStrides.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-scalar.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/TorchDispatchModeTLS.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-scalar.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/alloc_cpu.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-minmax-scalar-acc2.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/core/thread_pool.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u3.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u4.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u3.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u4.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool-cw/f32-gavgpool-cw-scalar-u1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7p7x-minmax-scalar-c1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7x-minmax-scalar-c1.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-minmax-scalar.c.o [ 4%] Building CXX object c10/CMakeFiles/c10.dir/mobile/CPUCachingAllocator.cpp.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-relu-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x4-minmax-scalar.c.o [ 5%] Building CXX object c10/CMakeFiles/c10.dir/mobile/CPUProfilingAllocator.cpp.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-maxpool/f32-maxpool-9p8x-minmax-scalar-c1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9p8x-minmax-scalar-c1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9x-minmax-scalar-c1.c.o [ 5%] Building CXX object c10/CMakeFiles/c10.dir/util/ApproximateClock.cpp.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-3x3-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x2-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-scalar-2x1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-scalar-2x4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-relu-scalar.c.o [ 5%] Building CXX object c10/CMakeFiles/c10.dir/util/Backtrace.cpp.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-1x1-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-2x1-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-4x1-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x1-minmax-scalar.c.o [ 5%] Building CXX object c10/CMakeFiles/c10.dir/util/Bfloat16.cpp.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u2.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/C++17.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u4.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/DeadlockDetection.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u2.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/Exception.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u2-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4-acc4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u2-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4-acc4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u2-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u3-acc3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u4-acc2.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e4m3fn.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u4-acc4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u2-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u3-acc3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u4-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u4-acc4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u2-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u3-acc3.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e4m3fnuz.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u4-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u4-acc4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u2-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u3-acc3.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u4-acc2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u4-acc4.c.o [ 6%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e5m2.cpp.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-1x1-minmax-scalar-pipelined.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-1x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-2x1-minmax-scalar-pipelined.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-2x1-minmax-scalar.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-scalar-pipelined.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-scalar.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-scalar-pipelined.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-scalar.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x2-minmax-scalar.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x4-minmax-scalar.c.o [ 7%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e5m2fnuz.cpp.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u2.c.o [ 7%] Building CXX object c10/CMakeFiles/c10.dir/util/Half.cpp.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u4.c.o [ 7%] Built target ATEN_CPU_FILES_GEN_TARGET [ 7%] Built target ATEN_CUDA_FILES_GEN_TARGET [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/hardware-config.dir/src/configs/hardware-config.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/scalar.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u1.c.o [ 7%] Built target hardware-config [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/indirection.dir/src/indirection.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u2.c.o [ 7%] Built target indirection [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u4.c.o [ 7%] Building CXX object c10/CMakeFiles/c10.dir/util/LeftRight.cpp.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/aarch32-assembler.cc.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u2.c.o [ 7%] Building CXX object c10/CMakeFiles/c10.dir/util/Logging.cpp.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u8.c.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/aarch64-assembler.cc.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u2.c.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/assembler.cc.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u8.c.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-1x16-aarch64-neonfp16arith-ld64.cc.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u2.c.o [ 7%] Building CXX object c10/CMakeFiles/c10.dir/util/MathConstants.cpp.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u1.c.o [ 7%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-4x16-aarch64-neonfp16arith-ld64.cc.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u8.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u2.c.o [ 8%] Building CXX object c10/CMakeFiles/c10.dir/util/Metaprogramming.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u4.c.o [ 8%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-cortex-a55.cc.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u4.c.o [ 8%] Building CXX object c10/CMakeFiles/c10.dir/util/Optional.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u8.c.o [ 8%] Building CXX object c10/CMakeFiles/c10.dir/util/ParallelGuard.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u1.c.o [ 8%] Building CXX object c10/CMakeFiles/c10.dir/util/SmallVector.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u4.c.o [ 8%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-cortex-a55r0.cc.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u1.c.o [ 8%] Building CXX object c10/CMakeFiles/c10.dir/util/StringUtil.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u4.c.o [ 8%] Building CXX object c10/CMakeFiles/c10.dir/util/ThreadLocalDebugInfo.cpp.o [ 8%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-cortex-a75.cc.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u2.c.o [ 8%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeCast.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u1.c.o [ 8%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-gemm/gen/f16-gemm-6x16-aarch64-neonfp16arith-ld64.cc.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neon.c.o [ 9%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeList.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u4.c.o [ 9%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-1x16-aarch64-neonfp16arith-ld64.cc.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u8.c.o [ 9%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeTraits.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u8.c.o [ 9%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-4x16-aarch64-neonfp16arith-ld64.cc.o [ 9%] Building CXX object c10/CMakeFiles/c10.dir/util/Type_demangle.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u8.c.o [ 9%] Building CXX object c10/CMakeFiles/c10.dir/util/Type_no_demangle.cpp.o [ 9%] Building CXX object c10/CMakeFiles/c10.dir/util/Unicode.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u1.c.o [ 9%] Building CXX object c10/CMakeFiles/c10.dir/util/UniqueVoidPtr.cpp.o [ 9%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-cortex-a55.cc.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u8.c.o [ 9%] Building CXX object c10/CMakeFiles/c10.dir/util/complex_math.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u8.c.o [ 10%] Building CXX object c10/CMakeFiles/c10.dir/util/flags_use_gflags.cpp.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u2.c.o [ 10%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-cortex-a55r0.cc.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u8.c.o [ 10%] Building CXX object c10/CMakeFiles/c10.dir/util/flags_use_no_gflags.cpp.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u3.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u5.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u6.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u1.c.o [ 10%] Building CXX object c10/CMakeFiles/c10.dir/util/int128.cpp.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u3.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u4.c.o [ 10%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-cortex-a75.cc.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u5.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u6.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c1-minmax-scalar-2x.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c2-minmax-scalar-2x.c.o [ 10%] Building CXX object c10/CMakeFiles/c10.dir/util/intrusive_ptr.cpp.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-scalar-2x.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u8.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u1.c.o [ 11%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f16-igemm/gen/f16-igemm-6x16-aarch64-neonfp16arith-ld64.cc.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u1.c.o [ 11%] Building CXX object c10/CMakeFiles/c10.dir/util/numa.cpp.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u1.c.o [ 11%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-1x8-aarch64-neonfma-cortex-a53.cc.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u1.c.o [ 11%] Building CXX object c10/CMakeFiles/c10.dir/util/signal_handler.cpp.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u2.c.o [ 11%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-1x8-aarch64-neonfma-cortex-a75.cc.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u2.c.o [ 11%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-1x8-aarch64-neonfma-ld64.cc.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 12%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-cortex-a53.cc.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u1.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u2.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u1.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u2.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u1.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfp16.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u2.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u1.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u2.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfma.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u1.c.o [ 12%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-cortex-a55.cc.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u2.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u3.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut4-p4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut8-p3.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut8-p4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut16-p3.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut16-p4.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-p5.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-p6.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-lut64-p2.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-lut2048-p1.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-p5.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-scalar-bitcast.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-scalar-fabsf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-addsub.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-cvt.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-floor.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-addsub.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-nearbyint.c.o [ 13%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-cortex-a75.cc.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-rint.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-addsub.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-ceil.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-cvt.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonv8.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-addsub.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-cvt.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-trunc.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-lut64-p2-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-lut2048-p1-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-p5-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h2ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h2ts-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h3ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h3ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h2ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h2ts-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ps-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ts-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neon-aarch64.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h2ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h2ts-rcp.c.o [ 13%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-4x8-aarch64-neonfma-ld128.cc.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h3ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h3ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut32-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut64-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h4ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfma-aarch64.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ps-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ts-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h2ts-div.c.o [ 13%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-cortex-a53.cc.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h3ps-div.c.o [ 13%] Building CXX object c10/CMakeFiles/c10.dir/util/tempfile.cpp.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h3ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h2ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h2ts-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ps-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ts-rcp.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h2ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h3ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h3ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut32-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut64-p3h1ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h4ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h5ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h5ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h2ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h3ps-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h3ts-div.c.o [ 14%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-cortex-a55.cc.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h2ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h3ps-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfp16arith.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h3ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h2ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h3ps-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h3ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut32-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut64-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h4ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h5ps-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h5ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h2ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h3ps-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h3ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h2ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h3ps-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h3ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h2ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h3ps-div.c.o [ 14%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-cortex-a75.cc.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h3ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut32-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut64-p3h1ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h4ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h5ps-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h5ts-div.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-bitmanip.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-clz-binsearch.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-clz-newton.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti32-sqrt-lrint.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti64-sqrt-lrint.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti64-sqrtf-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvtu32-sqrt-lrint.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvtu32-sqrtf-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-hashemian.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-tflm.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu32-sqrt-cvtsatu32f64.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu32-sqrt-llrint.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu64-sqrt-llrint.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x1-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x2-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x2-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neonfp16arith-aarch64.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8-minmax-scalar.c.o [ 14%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-gemm/gen/f32-gemm-6x8-aarch64-neonfma-ld128.cc.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondot.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x2-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8-minmax-scalar.c.o [ 14%] Building CXX object c10/CMakeFiles/c10.dir/util/thread_name.cpp.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x2-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4-minmax-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8-minmax-scalar.c.o [ 14%] Building CXX object c10/CMakeFiles/c10.dir/util/typeid.cpp.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4-minmax-scalar.c.o [ 14%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-1x8-aarch64-neonfma-cortex-a53.cc.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x2-minmax-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4-minmax-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8-minmax-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x2-minmax-scalar.c.o [ 15%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-1x8-aarch64-neonfma-cortex-a75.cc.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondot-aarch64.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4-minmax-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8-minmax-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondotfp16arith.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4-minmax-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 15%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-cortex-a53.cc.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neondotfp16-aarch64.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-cortex-a55.cc.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemminc-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Linking CXX shared library ../lib/libc10.so [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 15%] Built target c10 [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/cache.dir/src/cache.c.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma.S.o [ 15%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 16%] Built target cache [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operator-delete.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/argmax-pooling-nhwc.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/average-pooling-nhwc.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 16%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-cortex-a75.cc.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/batch-matrix-multiply-nc.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/binary-elementwise-nd.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/channel-shuffle-nc.c.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemm/gen/f32-gemm-goi-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 16%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/constant-pad-nd.c.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/convolution-nchw.c.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 17%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/convolution-nhwc.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 18%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-4x8-aarch64-neonfma-ld128.cc.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/deconvolution-nhwc.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 18%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-cortex-a53.cc.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/dynamic-fully-connected-nc.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/fully-connected-nc.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 18%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/global-average-pooling-ncw.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/global-average-pooling-nwc.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/lut-elementwise-nc.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-cortex-a55.cc.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/max-pooling-nhwc.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondotfp16arith-cortex-a55.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/prelu-nc.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld64.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/reduce-nd.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-rndnu-scalar.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/resize-bilinear-nchw.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mull.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/resize-bilinear-nhwc.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/rope-nthc.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/scaled-dot-product-attention-nhtc.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-rndnu-scalar.c.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 19%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/slice-nd.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/softmax-nc.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 20%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-cortex-a75.cc.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/transpose-nd.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-rndnu-scalar.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/unary-elementwise-nc.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 20%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2-k-over-64.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2-k-over-2048.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-4.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-8.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-16.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-32.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-64.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-2048.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/vlog.c.o [ 20%] Built target microkernels-prod [ 20%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAAllocatorConfig.cpp.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/unpooling-nhwc.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 20%] Built target operators [ 20%] Linking CXX static library ../lib/libcaffe2_protos.a [ 20%] Built target caffe2_protos [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/memory-planner.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/runtime.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 20%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/f32-igemm/gen/f32-igemm-6x8-aarch64-neonfma-ld128.cc.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/abs.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/add2.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/argmax-pooling-2d.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u1.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/average-pooling-2d.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u2.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u3.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/bankers-rounding.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u4.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/batch-matrix-multiply.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c1.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/ceiling.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/clamp.c.o [ 21%] Built target jit [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/concatenate.c.o [ 21%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c1.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/convert.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/convolution-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/copy.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/deconvolution-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/depth-to-space-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c1.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/depthwise-convolution-2d.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/divide.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/elu.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/even-split.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/floor.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c1.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/fully-connected-sparse.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/fully-connected.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/global-average-pooling.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c1.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/global-sum-pooling.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/hardswish.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/leaky-relu.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/max-pooling-2d.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/maximum2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/minimum2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/multiply2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p1c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/negate.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p2c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/prelu.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/reshape-helpers.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/scaled-dot-product-attention.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p2c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/sigmoid.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/softmax.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-4p2c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/space-to-depth-2d.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/square-root.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/square.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/squared-difference.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-constant-pad.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-mean.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-reshape.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-resize-bilinear-2d.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-slice.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-transpose.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/subtract.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/tanh.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/unpooling-2d.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/validation.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/tensor.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Built target subgraph [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/argmaxpool-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/avgpool-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/binary-elementwise-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/cmul-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/conv-hwc2chw-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/dwconv-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/dwconv2d-chw-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/experiments-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gavgpool-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gavgpool-cw-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gemm-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/ibilinear-chw-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/ibilinear-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/lut32norm-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/maxpool-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/pavgpool-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/prelu-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/raddstoreexpminusmax-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/reduce-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/rmax-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/spmm-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/transpose-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/unary-elementwise-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/unpool-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/vmulcaddc-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/xx-fill-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/xx-pad-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/x8-lut-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/zip-config.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/init.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/params.c.o [ 22%] Linking CXX static library ../../lib/libXNNPACK.a [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Built target XNNPACK [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/AccumulateType.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CPUGeneratorImpl.cpp.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CachedTensorUtils.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ConjugateFallback.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDADeviceAssertionHost.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-imagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-scalar-fmagic.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-scalar-lrintf.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-scalar.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-signed64.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-unsigned32.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-unsigned64.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-scalar.c.o [ 24%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAException.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u1.c.o [ 24%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAFunctions.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u1.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DLConvertor.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u1.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9p8x-minmax-fp32-scalar-imagic-c1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9x-minmax-fp32-scalar-imagic-c1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAMallocAsyncAllocator.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Dispatch.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DynamicLibrary.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAMiscFunctions.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAStream.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 24%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/impl/CUDAGuardImpl.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/impl/CUDATest.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ExpandUtils.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/driver_api.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 25%] Linking CXX shared library ../../lib/libc10_cuda.so Warning: Unused direct dependencies: libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libm.so.6 [ 25%] Built target c10_cuda [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u3.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FuncTorchTLS.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c2.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalInverses.cpp.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalStorageImpl.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c1.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-imagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-imagic.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalTensorWrapper.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-rndnu-scalar.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalizeFallbackKernel.cpp.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchedFallback.cpp.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-imagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-rndnu-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-scalar-fmagic.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-scalar-lrintf.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-scalar.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-signed64.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-unsigned32.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-unsigned64.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u2.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchedTensorImpl.cpp.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-9p8x-minmax-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-vclamp/s8-vclamp-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-lut32norm/u8-lut32norm-scalar.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-maxpool/u8-maxpool-9p8x-minmax-scalar-c1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-rmax/u8-rmax-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-vclamp/u8-vclamp-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-accumulate/gen/u32-filterbank-accumulate-scalar-x1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-subtract/u32-filterbank-subtract-scalar-x2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x4.c.o [ 27%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchingRegistrations.cpp.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u64-u32-vsqrtshift/u64-u32-vsqrtshift-scalar-cvtu32-sqrt-cvtu32f64-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u1.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x2-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x2-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x4-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x4-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x8-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x8-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x16-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x16-gemm-goi-scalar-int-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x32-gemm-goi-scalar-int-u2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x32-gemm-goi-scalar-int-u4.c.o [ 27%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapMode.cpp.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-1x2-scalar-int.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-1x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x1-scalar-int.c.o [ 28%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapTransforms.cpp.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x3-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-xm-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-1x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-1x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-1x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-1x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x1-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x1-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x2-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x4-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c1s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c1s1r-gemm-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c2s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c2s1r-gemm-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c1s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c1s1r-gemm-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c4s1r-gemm-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c4s1r-gemm-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x3-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x3-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x4-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x4-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-scalar-float-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-scalar-int-u4.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-2x-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-3x-scalar.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-4x-scalar.c.o [ 28%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/MapAllocator.cpp.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x2-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x4-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x4-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x1-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x1-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-scalar-int.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x4-scalar-float.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x4-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x1-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x1-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-unpool/x32-unpool-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c1s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c1s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c2s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c2s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c1s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c1s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c4s1r-gemm-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c4s1r-gemm-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x2-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x3-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x4-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-xm-scalar.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-1x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-1x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x1-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x1-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x1-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x1-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x2-scalar-float.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x2-scalar-int.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-copy/xx-copy-scalar-memcpy.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-fill/xx-fill-scalar-u16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-pad/xx-pad-p4-scalar-u16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-transposev/xx-transposev-1x1-scalar-memcpy.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u1.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u2.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u1.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u2.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u4.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h2ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h2ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ps-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ps-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p3h1ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h2ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h2ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ps-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p3h1ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h2ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h2ts-rcp.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h3ps-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h3ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut32-p3h1ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut64-p3h1ts-div.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ps-rcp.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ts-rcp.c.o [ 30%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/MemoryOverlap.cpp.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h2ts-rcp.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut32-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut64-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h2ts-div.c.o [ 30%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/NamedTensorUtils.cpp.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut32-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut64-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h2ts-div.c.o [ 30%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/NestedTensorImpl.cpp.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h2ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h3ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h3ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut32-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut64-p3h1ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h4ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h5ps-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h5ts-div.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-neon-x1.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-neon-x4.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples1-neon.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples4-neon.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/cs16-fftr-neon-x4.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x4.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x12.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-neon-mlal-ld128-x16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u24.c.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelCommon.cpp.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int16-u32.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u24.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neon-int32-u32.c.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelNative.cpp.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelNativeTBB.cpp.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelOpenMP.cpp.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-4x-neon-c4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9p8x-neon-c4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelThreadPoolNative.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9x-neon-c4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9p8x-minmax-neon-c4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9x-minmax-neon-c4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-neon-2x2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-neon-2x1.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-neon-2x2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-neon-2x1.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/PythonTorchFunctionTLS.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-neon-2x2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-neon-2x1.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-neon-2x2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-neon-2x1.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-neon-2x2.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SavedTensorHooks.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4-acc3.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ScalarOps.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4-acc4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SequenceNumber.cpp.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SparseCsrTensorImpl.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-1x4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SparseTensorImpl.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-2x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-2x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-3x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-4x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-5x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-neon-6x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4-acc4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-1x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-2x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-2x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-3x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-neon-4x4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/StorageUtils.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4-acc5.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-1x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-2x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-2x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-2x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-3x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-3x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-4x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-4x4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorGeometry.cpp.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorIndexing.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-neon-5x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorIterator.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4-acc5.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorMeta.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-1x4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-2x4-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-2x4-acc3.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-2x4.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorNames.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-3x4-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-neon-3x4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorUtils.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neon-acc2.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ThreadLocalPythonObjects.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ThreadLocalState.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neon-acc2.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Utils.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neon-acc2.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Version.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neon.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/VmapModeRegistrations.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neon-acc2.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u8.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ZeroTensorFallback.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u24.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/autocast_mode.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neon-u32.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool-cw/f32-gavgpool-cw-neon-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7p7x-minmax-neon-c4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7x-minmax-neon-c4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neon-dup-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neon-lane-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8s4-minmax-neon.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x16-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x16-minmax-neon-lane-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-neon-lane-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-dup-ld64.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-dup-ld128.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x2-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-dup-ld128.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/cpu/FlushDenormal.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-lane-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/cpu/Utils.cpp.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/CPUGuardImpl.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8s4-minmax-neon.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/CUDAHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-8x8s4-minmax-neon.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/HIPHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neon-dup-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/IPUHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-dup-ld128.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MPSHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-lane-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MTIAHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-neon-lane-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MetaGuardImpl.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-dup-ld128.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/ORTHooksInterface.cpp.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/PrivateUse1HooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8s4-minmax-neon.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/XPUHooksInterface.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-8x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neon-p4.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/ADInterpreters.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neon-p8.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neon-p16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neon-c4.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neon-c8.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neon-dup-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesActivation.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-dup-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-neon-lane-ld128.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesBinaryOps.cpp.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesConvolution.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x2-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-dup-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-dup-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-lane-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-neon-lane-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-8x8s4-minmax-neon.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-maxpool/f32-maxpool-9p8x-minmax-neon-c4.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9p8x-minmax-neon-c4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9x-minmax-neon-c4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-neon-prfm.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-neon.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-neon-prfm.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-neon.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-neon-prfm.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-neon.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-1x4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-1x8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-1x16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-2x4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-2x8.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesDecompositions.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-2x16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-4x4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-4x8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-neon-4x16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x2-minmax-neon-lane-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-neon-dup-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-neon-lane-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesDynamic.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u24.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neon-u32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u24.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neon-u32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u8-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u12-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u12-acc3.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u12.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u16-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u16-acc4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u20-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u20-acc5.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-lut64-p2-u20.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u4.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u8-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u12-acc2.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u12-acc3.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u12.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u20-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u20-acc5.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neon-rr2-p5-u20.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u8-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u8-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u8-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u8-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u12-acc3.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u16-acc2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-neon-u16-acc4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neon-x2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neon-x2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neon-x2.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesFactory.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neon-pipelined.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neon-x2.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neon.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-neon-u4.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesHelper.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-neon-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-neon-u4.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-neon-u4.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesLinearAlgebra.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-neon-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-neon-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-lut16-p3-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neon-rr2-p6-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-neon-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-neon-2x.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c8-minmax-neon-2x.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neon-u4.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesLoss.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neon-u8.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesModules.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neon-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neon-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut64-p2-nr2recps-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-lut2048-p1-nr2recps-u24.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesNorm.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u12.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u20.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neon-rr2-p5-nr2recps-u24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u4.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u12.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neon-expm1minus-rr1-p6h5ts-nr2recps-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-neon-u4.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-neon-u4.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-neon-u4.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u24.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-neon-u32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-neon-int16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-neon-int32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neon-rr2-lut16-p3.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neon-rr2-p6.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-neon.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qs8-cvt-neon.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesPooling.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qu8-cvt-neon.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-neon-addsub.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-neon-cvt.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-neon-addsub.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-neon-addsub.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-neon-cvt.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-neon-addsub.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-neon-cvt.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neon-rr2-lut64-p2-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neon-rr2-lut2048-p1-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neon-rr2-p5-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neon-nr1rsqrts.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neon-nr2rsqrts.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neon-nr3rsqrts.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neon-expm1minus-rr1-p6h5ts-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neon-expm1minus-rr2-lut8-p4h2ts-nr2recps.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesRandomness.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neon-expm1minus-rr2-lut8-p4h3ps-nr2recps.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesReduceOps.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c2s4-minmax-neon-mlal.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c2s4-minmax-neon-mlal.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8-minmax-neon-mlal-lane.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16-minmax-neon-mlal-lane-prfm.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesScatterOps.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c2s4-minmax-neon-mlal.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c2s4-minmax-neon-mlal.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16-minmax-neon-mlal-lane.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesUnaryOps.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesViews.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16-minmax-neon-mlal-lane-prfm.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16-minmax-neon-mlal-lane.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mla8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchedFallback.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mla8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul8-ld128.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchedTensorImpl.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul16.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul8-ld128.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/DynamicLayer.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c8s8r-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/FunctionalizeInterpreter.cpp.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/Interpreter.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mul8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-neon-mul16.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-rndnu-neon-mul16.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/LegacyVmapTransforms.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mul8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mul8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-rndnu-neon-mul16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-neon-u32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c24.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/PlumbingHelper.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-rndnu-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c16.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/PyTorchOperatorHacks.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-rndnu-neon-c32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p8c-minmax-fp32-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neon-mla8-ld64.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neon-mla8-ld128.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-4p8c-minmax-fp32-neon-mla8-ld64.c.o [ 41%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/TensorWrapper.cpp.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mla8-ld128.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 41%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/VmapInterpreter.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/VmapModeRegistrations.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/record_function.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ATenGeneral.cpp.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/BackendSelectFallbackKernel.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c8s8r-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-neon-mul16.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/DeprecatedTypeProperties.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/DeprecatedTypePropertiesRegistry.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mla8-ld64.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Dict.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mla8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mul8-ld64.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Dimname.cpp.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Formatting.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mul8-ld128.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neon-mul16.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Generator.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-neon-mul16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-dup.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/GeneratorForPrivateuseone.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2s4-minmax-fp32-neon-mlal.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neon-mlal-dup.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/List.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4s2-minmax-fp32-neon-mlal.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-neon-mlal.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/MetaFallbackKernel.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neon-mlal-lane.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-dup.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2s4-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4s2-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NamedRegistrations.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NamedTensor.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2s4-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4s2-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NestedIntSymNodeImpl.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neon-mlal-ld4r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2s4-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neon-mlal-dup.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neon-mlal-ld1r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neon-mlal-ld2r.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4s2-minmax-fp32-neon-mlal.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-neon-mlal.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/PythonFallbackKernel.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neon-mlal-lane.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neon-mlal-lane-prfm.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neon-mlal-lane.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-neon.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/PythonOpRegistrationTrampoline.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-neon.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-neon.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-neon-mull.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-neon-qdmulh.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Range.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u8.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Tensor.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u24.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld64-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-neon-ld128-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/TorchDispatchUtils.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u24.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld64-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-neon-ld128-u32.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/VariableFallbackKernel.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-neon-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-neon-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-neon-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-neon-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-neon-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/VariableHooksInterface.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-neon-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-rndnu-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-rndnu-neon-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-rndnu-neon-ld128-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Vitals.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neon-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neon-ld64-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/adaption.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-rndnu-neon-ld64-u8.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/blob.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-rndnu-neon-ld64-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/boxing/KernelFunction.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-rndnu-neon-ld128-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-neon-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-neon-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-neon-u32.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/class_type.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9p8x-minmax-fp32-neon-c8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9x-minmax-fp32-neon-c8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/custom_class.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/DispatchKeyExtractor.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-fp32-neon-mul16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/Dispatcher.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/ObservedOperators.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/OperatorEntry.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dynamic_type.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-rndnu-neon-mul16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/function_schema.cpp.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/interned_strings.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-fp32-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-rndnu-neon-mul8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-rndnu-neon-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-neon-u32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-rndnu-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-rndnu-neon-c32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ivalue.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x16-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8-minmax-rndnu-neon-mlal-lane.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/library.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-fp32-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x8-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x16-minmax-rndnu-neon-mlal-lane.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-neon.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-neon.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-neon.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld64-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld64-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-neon-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-neon-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-neon-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-neon-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-neon-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-neon-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-neon-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-neon-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-neon-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-rndnu-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-rndnu-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-rndnu-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neon-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-rndnu-neon-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-rndnu-neon-ld64-u16.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/op_registration/infer_schema.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-rndnu-neon-ld128-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-neon-c8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-2p2x-minmax-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-4p3x-minmax-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-9p8x-minmax-neon-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-vclamp/s8-vclamp-neon-u64.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x24.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-neon-x32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u24.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-neon-u32.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u24.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/op_registration/op_registration.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift12-neon-u32.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u24.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-shift15-neon-u32.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-neon-c8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-neon-c16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-maxpool/u8-maxpool-9p8x-minmax-neon-c16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-rmax/u8-rmax-neon-u16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-vclamp/u8-vclamp-neon-u64.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-accumulate/gen/u32-filterbank-accumulate-neon-x1.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-accumulate/gen/u32-filterbank-accumulate-neon-x2.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-multi-dec-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-multi-mov-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-multi-switch-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-dec-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-mov-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-multi-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-8x8-reuse-switch-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-dec-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-mov-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-switch-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x2-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x3-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x4-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-xm-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u4.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u12-prfm.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u12.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u16-prfm.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-neon-ld4lane-u16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u4.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u8.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/operator_name.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u12-prfm.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/register_symbols.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u12.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/tensor_type.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u16-prfm.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-neon-ld4lane-u16.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/type.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-dec-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-mov-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-multi-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-multi-switch-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-dec-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-mov-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-multi-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-reuse-switch-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-dec-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-mov-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-switch-zip-neon.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/x24-transposec-2x2-neon-tbl64.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-neon-ld2lane-u2-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-neon-ld2lane-u2.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-neon-ld4lane-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-neon-ld4lane-u8.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/type_factory.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x12-gemm-goi-neon-ld4lane-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u8-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-neon-ld4lane-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u8-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-4x-neon-st4-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u4-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u4.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u8-prfm.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/gen/x32-packx-8x-neon-st4-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-multi-zip-neon.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/union_type.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-multi-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-reuse-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-mov-zip-neon.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/error_report.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-switch-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-unpool/x32-unpool-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x2-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x3-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x4-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-xm-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-switch-zip-neon.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/function_schema_parser.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-dec-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-mov-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-multi-zip-neon.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-switch-zip-neon.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-fill/xx-fill-neon-u64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-pad/xx-pad-p16-neon-u16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neonfp16-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-neonfp16-u16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u4.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u16-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u24-acc3.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u32-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-neonfp16-u32-acc4.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neonfp16-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-neonfp16-u16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-neonfp16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-neonfp16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonfma-shland.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonfma-zip.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-2x4c8-minmax-neonfma-shland.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-2x4c8-minmax-neonfma-zip.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-3x4c8-minmax-neonfma-shland.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-3x4c8-minmax-neonfma-zip.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/lexer.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-4x4c8-minmax-neonfma-shland.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-4x4c8-minmax-neonfma-zip.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-5x4c8-minmax-neonfma-shland.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-5x4c8-minmax-neonfma-zip.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-neonfma.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/schema_type_parser.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-neonfma.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/strtod.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neonfma-acc2.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/source_range.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-neonfma.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neonfma-acc2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-neonfma-dup-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-neonfma-dup-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-8x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-neonfma-dup-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-neonfma-dup-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-8x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neonfma-p4.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neonfma-p8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-neonfma-p16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neonfma-c4.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-neonfma-c8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Activation.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-neonfma-dup-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-neonfma-dup-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-8x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-neonfma-dup-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8s4-minmax-neonfma.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u4.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u8-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u12-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u12-acc3.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u12.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveAveragePooling.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u16-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u16-acc4.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u20-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u20-acc5.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-lut64-p2-u20.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u4.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u8-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u12-acc2.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u12-acc3.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u12.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u16-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u16-acc4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u20-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u20-acc5.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-neonfma-rr1-p5-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neonfma-pipelined.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neonfma-x2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neonfma-pipelined.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveAveragePooling3d.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neonfma-x2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x1-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neonfma-pipelined.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neonfma-x2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neonfma-pipelined.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neonfma-x2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-neonfma.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u4.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveMaxPooling2d.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u12.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-lut16-p3-u24.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u12.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-neonfma-rr1-p6-u24.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-neonfma-2x.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c8-minmax-neonfma-2x.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u12.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr1recps1fma-u24.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveMaxPooling3d.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u12.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2fma-u24.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u12.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AffineGridGenerator.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut64-p2-nr2recps-u24.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u12.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u20.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma-u24.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u4.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u12.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u20.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2fma-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u4.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AmpKernels.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u20.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-lut2048-p1-nr2recps-u24.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u20.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr1recps1fma-u24.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u16.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AutogradComposite.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u20.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2fma-u24.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u20.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-neonfma-rr1-p5-nr2recps-u24.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AveragePool2d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr1rsqrts1fma1adj-u4.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AveragePool3d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr1rsqrts1fma1adj-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr1rsqrts1fma1adj-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr2fma1adj-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr2fma1adj-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-neonfma-nr2fma1adj-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr1recps1fma-u16.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BatchLinearAlgebra.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-lut8-p4h3ts-nr2fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u4.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u8.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u4.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BinaryOps.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u12.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-neonfma-rr2-lut64-p2.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-neonfma-rr2-p5.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neonfma-rr1-lut16-p3.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-neonfma-rr1-p6.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-neonfma-rr2-lut64-p2.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-neonfma-rr2-lut2048-p1.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-neonfma-rr2-p5.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut64-p2-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut64-p2-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut64-p2-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut2048-p1-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut2048-p1-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-lut2048-p1-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-p5-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-p5-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr1-p5-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut64-p2-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut64-p2-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut64-p2-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut2048-p1-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut2048-p1-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-lut2048-p1-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-p5-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-p5-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-neonfma-rr2-p5-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr1fma.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Blas.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr1rsqrts1fma1adj.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BlasKernel.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr2fma1adj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-neonfma-nr3fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h2ts-nr1recps1fma.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Bucketization.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h2ts-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h2ts-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr1recps1fmaadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2fmaadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2recps.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/CPUBlas.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-lut8-p4h3ps-nr2recpsadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr1recps1fmaadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2fmaadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2recps.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-neonfma-expm1minus-rr1-p6h5ts-nr2recpsadj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-neonv8-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u24.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/CPUFallback.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-neonv8-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neonv8-u4.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-neonv8-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neonv8-u4.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-neonv8-u8.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ChanelShuffle.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neonv8-u4.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Col2Im.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-neonv8-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neonv8-u4.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-neonv8-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qs8-cvt-neonv8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-qu8-cvt-neonv8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-neonv8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-neonv8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-neonv8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-neonv8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ComparisonUtils.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Constraints.cpp.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Convolution.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-neonv8-mul16.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionMM2d.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c24.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionMM3d.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-neonv8-c32.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c24.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-neonv8-c32.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p8c-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-neonv8-mla8-ld128.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mla8-ld128.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionTBC.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul8-ld128.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Copy.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mla8-ld128.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Correlation.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul8-ld128.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Cross.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mla8-ld128.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul8-ld128.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool2d.cpp.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool3d.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DispatchStub.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neonv8-mul8-ld64.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-neonv8-mul16.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mla8-ld64.c.o [ 55%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mla8-ld128.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mul8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distributions.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Dropout.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mla8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mla8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mul8-ld64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mul8-ld128.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-neonv8-mul16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Embedding.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/EmbeddingBag.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-neonv8-mlal.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Fill.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ForeachOpsKernels.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-neonv8-mlal.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FractionalMaxPool2d.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FractionalMaxPool3d.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FunctionOfAMatrixUtils.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8-minmax-fp32-neonv8-mlal-lane.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GatedLinearUnit.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2-minmax-fp32-neonv8-mlal-ld4r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c2s4-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neonv8-mlal-dup.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neonv8-mlal-ld1r.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4-minmax-fp32-neonv8-mlal-ld2r.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Histogram.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c4s2-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-neonv8-mlal.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neonv8-mlal-lane-prfm.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16-minmax-fp32-neonv8-mlal-lane.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neonv8-ld64-u8.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neonv8-ld64-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-neonv8-ld128-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neonv8-ld64-u8.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neonv8-ld64-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-neonv8-ld128-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Im2Col.cpp.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/IndexingUtils.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-fp32-neonv8-mul16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c8.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c24.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-neonv8-c32.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c8.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c24.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-neonv8-c32.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-fp32-neonv8-mlal-lane.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neonv8-ld64-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neonv8-ld64-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-neonv8-ld128-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neonv8-ld64-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neonv8-ld64-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-neonv8-ld128-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-aarch64-neon-u4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-aarch64-neon-u8.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Integration.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-aarch64-neon-u4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-aarch64-neon-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-aarch64-neon-u4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-aarch64-neon-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-aarch64-neon-sqrt-u4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-aarch64-neon-sqrt-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-aarch64-neon-sqrt-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u32.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u48.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Itertools.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-aarch64-neon-tbx128x4-u64.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/x24-transposec-4x4-aarch64-neon-tbl128.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/x32-transposec-4x4-aarch64-neon-tbl128.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-aarch64-neonfma-2x2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-aarch64-neonfma-2x1.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x4-aarch64-neonfma-2x2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-aarch64-neonfma-2x1.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p0p1c3x8-aarch64-neonfma-2x2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-aarch64-neonfma-2x1.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x4-aarch64-neonfma-2x2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-aarch64-neonfma-2x1.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/gen/f32-conv-hwc-3x3s2p1c3x8-aarch64-neonfma-2x2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4-acc4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-1x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-2x4-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-2x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-3x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-4x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-5x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-aarch64-neonfma-6x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4-acc2.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LegacyBatching.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4-acc4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-1x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-2x4-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-2x4.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LegacyBridge.cpp.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Lerp.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-3x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-aarch64-neonfma-4x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4-acc5.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-1x4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-2x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-2x4-acc3.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-2x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-3x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-3x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-4x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-4x4.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Linear.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-aarch64-neonfma-5x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc3.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc4.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LinearAlgebra.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4-acc5.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-1x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-2x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-2x4-acc3.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-2x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-3x4-acc2.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-aarch64-neonfma-3x4.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Loss.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossCTC.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-aarch64-neonfma-prfm.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossMultiLabelMargin.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-aarch64-neonfma.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-aarch64-neonfma-prfm.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x16-minmax-aarch64-neonfma.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-aarch64-neonfma-prfm.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossMultiMargin.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL2d.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x16-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x2-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-aarch64-neonfma-lane-ld64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-aarch64-neonfma-lane-ld128.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x2-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x4-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x2-minmax-aarch64-neonfma.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x4-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x2-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-12x4-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x2-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x4-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x2-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x4-minmax-aarch64-neonfma.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u12.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u20.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxUnpooling.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut64-p2-div-u24.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Memory.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u4.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MetaTensor.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u12.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u20.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-lut2048-p1-div-u24.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u4.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u8.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NNPACK.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u12.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u20.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-aarch64-neonfma-rr1-p5-div-u24.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u8.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u12.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ts-div-u16.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u4.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u8.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u12.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div-u16.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr1-lut64-p2-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr1-lut2048-p1-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr1-p5-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr2-lut64-p2-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr2-lut2048-p1-div.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveDilatedConvolution.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-aarch64-neonfma-rr2-p5-div.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-aarch64-neonfma-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-aarch64-neonfma-expm1minus-rr1-p6h5ts-div.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-fp16arith-u2.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NamedTensor.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-fp16arith-u1.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NegateFallback.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-fp16arith-u2.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Normalization.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-fp16arith-u4.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Onehot.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-fp16arith-u2.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PackedSequence.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-fp16arith-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-fp16arith-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-fp16arith-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-fp16arith-u4.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PadNd.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-fp16arith-sqrt-u1.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-fp16arith-sqrt-u2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-fp16arith-sqrt-u4.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x16c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x16c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c4-minmax-neondotfp16arith.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x16c4-minmax-neondotfp16arith.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PixelShuffle.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x16c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PointwiseOps.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x8c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Pooling.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-5x8c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Pow.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-5x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-6x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-6x16c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/QuantizedLinear.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x32c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RNN.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x8c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x32c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RangeFactories.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x8c4-minmax-neondotfp16arith.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReduceAllOps.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x16c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x32c4-minmax-neondotfp16arith.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-avgpool/f16-avgpool-9p8x-minmax-neonfp16arith-c8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-avgpool/f16-avgpool-9x-minmax-neonfp16arith-c8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8-acc2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8-acc3.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8-acc4.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-1x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-2x8-acc2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-2x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-3x8.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReduceOps.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-4x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-5x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3p1-minmax-neonfp16arith-6x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8-acc2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8-acc3.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8-acc4.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-1x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-2x8-acc2.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReflectionPad.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-2x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-3x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-3x3s2p1-minmax-neonfp16arith-4x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc3.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc4.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8-acc5.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-1x8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-2x8-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-2x8-acc3.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-2x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-3x8-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-3x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-4x8-acc2.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Repeat.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-4x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5p2-minmax-neonfp16arith-5x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc3.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc4.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8-acc5.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-1x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-2x8-acc2.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReplicationPadding.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-2x8-acc3.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-2x8.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Resize.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-3x8-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv2d-chw/gen/f16-dwconv2d-chw-5x5s2p2-minmax-neonfp16arith-3x8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p8c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p8c-minmax-neonfp16arith.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RowwisePrune.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p16c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p16c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p32c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p32c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p8c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p8c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p16c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p16c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p32c-minmax-neonfp16arith-acc2.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Scalar.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p32c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l8c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l8c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l16c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l16c8s4r-minmax-neonfp16arith.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SegmentReduce.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l32c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l32c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l8c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l8c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l16c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l16c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l32c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SobolEngineOpsUtils.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l32c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l8c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l8c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l16c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l16c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l32c8s4r-minmax-neonfp16arith-acc2.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SoftMax.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l32c8s4r-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p8c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p8c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p16c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p16c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p32c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p32c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p8c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p8c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p16c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p16c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p32c-minmax-neonfp16arith-acc2.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p32c-minmax-neonfp16arith.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool-cw/f16-gavgpool-cw-neonfp16arith-u8.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Sorting.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c24.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-neonfp16arith-c32.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c24.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-neonfp16arith-c32.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SparseTensorUtils.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SpectralOps.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-8x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-8x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-8x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-8x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear-chw/gen/f16-ibilinear-chw-neonfp16arith-p4.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear-chw/gen/f16-ibilinear-chw-neonfp16arith-p8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear-chw/gen/f16-ibilinear-chw-neonfp16arith-p16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear/gen/f16-ibilinear-neonfp16arith-c8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear/gen/f16-ibilinear-neonfp16arith-c16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-1x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-1x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-4x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-4x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SummaryOps.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-6x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-6x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-8x8-minmax-neonfp16arith-ld64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-8x16-minmax-neonfp16arith-ld64.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorAdvancedIndexing.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-maxpool/f16-maxpool-9p8x-minmax-neonfp16arith-c8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-pavgpool/f16-pavgpool-9p8x-minmax-neonfp16arith-c8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-pavgpool/f16-pavgpool-9x-minmax-neonfp16arith-c8.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorCompare.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-prelu/gen/f16-prelu-neonfp16arith-2x8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-prelu/gen/f16-prelu-neonfp16arith-2x16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u24.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u32.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-neonfp16arith-u64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u32-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u32-acc4.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u32.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u40-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u40-acc5.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorConversions.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u40.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u48-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u48-acc3.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u48.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u64-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u64-acc4.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u72-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u72.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u80-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u80-acc5.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u80.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96-acc3.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorFactories.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96-acc6.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-neonfp16arith-rr2-p2-u96.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u16-acc1.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u16-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u24-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u24-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u24.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u32-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u64-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-neonfp16arith-u64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u16-acc1.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u16-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u24-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u24-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u24.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u32-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u64-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-neonfp16arith-u64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u16-acc1.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u16-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u24-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u24-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u24.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u32-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u64-acc4.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorIteratorReduce.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-neonfp16arith-u64.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u8.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u16-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u24-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rsum/gen/f16-rsum-neonfp16arith-u32-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-8x1-minmax-neonfp16arith-pipelined.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-8x1-minmax-neonfp16arith-x2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-8x1-minmax-neonfp16arith.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-16x1-minmax-neonfp16arith-pipelined.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-16x1-minmax-neonfp16arith-x2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-16x1-minmax-neonfp16arith.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-24x1-minmax-neonfp16arith-pipelined.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorProperties.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-24x1-minmax-neonfp16arith-x2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-24x1-minmax-neonfp16arith.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorShape.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-32x1-minmax-neonfp16arith-pipelined.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-32x1-minmax-neonfp16arith-x2.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorTransformations.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-spmm/gen/f16-spmm-32x1-minmax-neonfp16arith.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-neonfp16arith-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TestOps.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-neonfp16arith-u8.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TriangularOps.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vclamp/gen/f16-vclamp-neonfp16arith-u8.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TypeProperties.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vclamp/gen/f16-vclamp-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-velu/gen/f16-velu-neonfp16arith-rr1-p3-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-velu/gen/f16-velu-neonfp16arith-rr1-p3-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vhswish/gen/f16-vhswish-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vhswish/gen/f16-vhswish-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vlrelu/gen/f16-vlrelu-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vlrelu/gen/f16-vlrelu-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vmulcaddc/gen/f16-vmulcaddc-c8-minmax-neonfp16arith-2x.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vmulcaddc/gen/f16-vmulcaddc-c16-minmax-neonfp16arith-2x.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndd-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndd-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndne-neonfp16arith-u8.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UnaryOps.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndne-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndu-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndu-neonfp16arith-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unfold2d.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndz-neonfp16arith-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndz-neonfp16arith-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unfold3d.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u24.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u32.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UnfoldBackward.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u40.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u48.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u56.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1fma-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u8.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u24.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u40.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u56.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-neonfp16arith-rr2-p2-nr1recps-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-neonfp16arith-nr1fma1adj-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-neonfp16arith-nr1fma1adj-u16.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unique.cpp.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSample.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-neonfp16arith-nr1fma1adj-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u8.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleBicubic2d.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u24.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u40.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u56.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u72.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleBilinear2d.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma-u80.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u24.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u40.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u56.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u72.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps-u80.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleLinear1d.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u24.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u40.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u48.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest1d.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u56.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u72.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj-u80.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vabs-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vabs-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vneg-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vneg-neonfp16arith-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vsqr-neonfp16arith-u8.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vsqr-neonfp16arith-u16.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest2d.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-exp-neonfp16arith-rr2-p3.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expm1minus-neonfp16arith-rr1-p3.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expm1minus-neonfp16arith-rr2-p3.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr1-p2.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr1-p3.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest3d.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr2-p2.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-neonfp16arith-rr2-p3.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleTrilinear3d.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p2-nr1fma.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p2-nr1recps.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p2-recpe.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p3-nr1fma.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p3-nr1recps.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-neonfp16arith-rr2-p3-recpe.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-neonfp16arith-nr1fma1adj.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-neonfp16arith-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-neonfp16arith-nr1rsqrts.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1fmaadj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1recps.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-nr1recpsadj.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/VariableMethodStubs.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-recpe.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h1ts-recpeadj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1fmaadj.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/WeightNorm.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recps.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-nr1recpsadj.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/group_norm.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpe.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-neonfp16arith-expm1minus-rr1-p3h2ts-recpeadj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x16-minmax-neonfp16arith-mlal-lane.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x16-minmax-neonfp16arith-mlal-lane.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/layer_norm.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x16-minmax-neonfp16arith-mlal-lane.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x16-minmax-neonfp16arith-mlal-lane.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x16-minmax-neonfp16arith-mlal-lane-prfm.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x16-minmax-neonfp16arith-mlal-lane.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/prim_native_functions.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c2s4-minmax-neonfp16arith.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c2s4-minmax-neonfp16arith.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/verbose_wrapper.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c2s4-minmax-neonfp16arith-mlal.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/library.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c2s4-minmax-neonfp16arith-mlal.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u24.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-neonfp16arith-u32.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/fbgemm_utils.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-aarch64-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-aarch64-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-aarch64-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-aarch64-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-aarch64-neonfp16arith-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-aarch64-neonfp16arith-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u24.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u40.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u56.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-aarch64-neonfp16arith-rr2-p2-div-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-aarch64-neonfp16arith-sqrt-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-aarch64-neonfp16arith-sqrt-u16.c.o [ 70%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_deserialize.cpp.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-aarch64-neonfp16arith-sqrt-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u8.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u24.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u40.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u56.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u72.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr1-p2-div.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr1-p3-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr2-p2-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-aarch64-neonfp16arith-rr2-p3-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sqrt-aarch64-neonfp16arith-sqrt.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-aarch64-neonfp16arith-expm1minus-rr1-p3h1ts-div.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_dynamic.cpp.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-aarch64-neonfp16arith-expm1minus-rr1-p3h2ts-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-neondot-ld64.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_serialize.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x8c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/FlattenIndicesKernel.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/ParamUtils.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x8c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c4-minmax-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SoftMax.cpp.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBinaryOpIntersectionKernel.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x32c4-minmax-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8c4-minmax-fp32-neondot.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBlas.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c4-minmax-fp32-neondot.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-fp32-neondot-ld64.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x8c4-minmax-fp32-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBlasImpl.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16c4-minmax-fp32-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseCsrTensor.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x32c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-5x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-5x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-8x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-8x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseFactories.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16c4-minmax-fp32-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseMatMul.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x32c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-5x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-5x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x16c4-minmax-rndnu-neondot.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseTensor.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-8x8c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-8x16c4-minmax-rndnu-neondot.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseTensorMath.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-fp32-aarch64-neondot-ld128.c.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 72%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemminc-8x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-1x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-4x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a55r0.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld32.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/f16-igemm-6x16-minmax-asm-aarch64-neonfp16arith-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/f32-dwconv-9p4c-minmax-asm-aarch64-neonfma.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 73%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-goi-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-goi-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-1x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-4x12-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a55.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a73.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a53.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 74%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-cortex-a75.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-8x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neon-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc2.S.o [ 75%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseUnaryOps.cpp.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc2.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-acc4.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128-prfm.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x1-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld64.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-asm-aarch64-neonfma-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondotfp16arith-cortex-a55.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 75%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-asm-aarch64-neon-mull.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld32.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c16-minmax-fp32-asm-aarch64-neon-mlal.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a53.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-cortex-a75.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64-prfm.S.o [ 76%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16-minmax-rndnu-asm-aarch64-neon-mlal-lane-ld64.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-cortex-a55.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-fp32-asm-aarch64-neondot-ld128.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-cortex-a55.S.o [ 77%] Building ASM object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c4-minmax-rndnu-asm-aarch64-neondot-ld128.S.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2-k-over-64.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2-k-over-2048.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-4.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-8.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-16.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-32.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-64.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-2048.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/vlog.c.o [ 77%] Built target microkernels-all [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/ValidateCompressedIndicesKernel.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorAliases.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorBackward.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorBinaryOps.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorFactories.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorMath.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorMatmul.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorUnaryOps.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorUtils.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/AffineQuantizer.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/Copy.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/FakeQuantPerChannelAffine.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/QTensor.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorAdvancedIndexing.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorCompare.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorFactories.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AdaptiveAveragePooling.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AveragePool2d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AveragePool3d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/BinaryOps.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/ChannelShuffle.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/IntReprQuant.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/LinearUnpackImpl.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/MakePerTensorQuantizedTensor.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Normalization.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Pooling.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/ReduceOps.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/RuyUtils.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Sorting.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/TensorShape.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleBilinear2d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleNearest2d.cpp.o [ 77%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleNearest3d.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/XnnpackUtils.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/fused_obs_fake_quant.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/init_qnnpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qdropout.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qgelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qhardsigmoid.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qhardswish.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmatmul.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmul.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qnormalization.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qrelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qsigmoid.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qsoftmax.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qtanh.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qthreshold.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/library.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/qconv_unpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/qlinear_unpack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/LinearAlgebra.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SparseBlasImpl.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SpectralOps.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/BinaryOps.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Conv.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/ConvPrepack.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Copy.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Gelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/IDeepRegistration.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Linear.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MKLDNNCommon.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Matmul.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MkldnnTensorMath.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Normalization.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/OpContext.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Pooling.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Prelu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/RNN.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Relu.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/SoftMax.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/TensorFactories.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/TensorShape.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/UnaryOps.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Utils.cpp.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/attention.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/sdp_utils_cpp.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/transformer.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/utils/Factory.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Activation.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/AveragePooling.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/ChannelShuffle.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Convolution.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Init.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Linear.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/MaxPooling.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/OpContext.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Shim.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CompositeViewCopyKernels.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_0.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_1.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_3.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_4.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterBackendSelect.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeExplicitAutogradNonFunctional.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeImplicitAutograd.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeImplicitAutogradNestedTensor.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_0.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_1.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_2.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_3.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMkldnnCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterNestedTensorCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterNestedTensorMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterQuantizedCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterQuantizedMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCsrCPU.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCsrMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseMeta.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterZeroTensor.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPU_add.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ATenOpList.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/TensorMethods.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/quantized/QTensorImpl.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/quantized/Quantizer.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_bind.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_model_loader.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_register.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_wrapper.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPUKernel_add.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/spherical_bessel_j0.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k1.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k0.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/layer_norm_kernel.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int8mm_kernel.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int4mm_kernel.cpp.DEFAULT.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/group_norm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/batch_norm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/airy_ai.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/WeightNormKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Unfold2d.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/TensorCompareKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SumKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/StackKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SparseFactories.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SortingKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SoftMaxKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SampledAddmmKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RenormKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceAllOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PowKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PaddingKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/NativeMultiheadAttnKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MultinomialKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxUnpoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LinearAlgebraKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LerpKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/IndexKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/HistogramKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FlashAttentionKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FillKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistributionKernels.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DepthwiseConvKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CrossKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ComplexKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CatKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BlasKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AvgPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AmpGradScalerKernels.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveMaxPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveAvgPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Activation.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/vulkan/Context.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/metal/Context.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/core/common.cc.o [ 80%] Building C object caffe2/CMakeFiles/torch_cpu.dir/__/third_party/miniz-2.1.0/miniz.c.o /builddir/build/BUILD/pytorch/third_party/miniz-2.1.0/miniz.c:3157:9: note: ‘#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.’ 3157 | #pragma message("Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.") | ^~~~~~~ [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/inline_container.cc.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/istream_adapter.cc.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/file_adapter.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/crc.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/read_adapter_interface.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/string_utils.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/ThreadPool.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/pthreadpool-cpp.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/thread_pool_guard.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/proto_wrap.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/adagrad.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/batch_box_cox.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/embedding_lookup.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/embedding_lookup_idx.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/fused_8bit_rowwise_embedding_lookup.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/fused_8bit_rowwise_embedding_lookup_idx.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/fused_nbit_rowwise_conversion.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/lstm_unit_cpu_common.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/math_cpu_base.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/perfkernels/typed_axpy.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/Functions.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ViewFuncs.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_3.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_2.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_3.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_4.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/LazyNativeFunctions.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/RegisterLazy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/anomaly_mode.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd_meta.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd_not_implemented_fallback.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/cpp_hook.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/custom_function.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/engine.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/forward_grad.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/function.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/accumulate_grad.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/basic_ops.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/tensor.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/utils.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/input_buffer.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/input_metadata.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/jit_decomp_interface.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/profiler_kineto.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/profiler_legacy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/record_function_ops.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/saved_variable.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/utils/warnings.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/variable.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/variable_info.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner_cpu.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/shim_common.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/tensor_converter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/inductor_ops.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/function_impl.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/module.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/object.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_debug_handler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_debug_info.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_detail.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_interface.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_resolver.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/codegen.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/compiler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/executor.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/fallback.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/interface.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/kernel_cache.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/builtin_functions.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/canonicalize_modified_loop.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/convert_to_ssa.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/edit_distance.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/exit_transforms.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/inline_loop_condition.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/name_mangler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/parser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/schema_matching.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/script_type_parser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/sugared_value.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/tracer.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/versioned_symbols.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/alias_analysis.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/attributes.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/constants.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/graph_utils.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/irparser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/node_hashing.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/scope.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/subgraph_matcher.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/type_hashing.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/jit_log.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/jit_opt_limit.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/model_compatibility.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/runtime_compatibility.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/flatbuffer_loader.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/function.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/import.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/interpreter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/module.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/aot_compiler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/backend.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/context.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/registry.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/observer.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/parse_bytecode.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/parse_operators.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/prim_ops_registery.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/promoted_prim_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/quantization.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/register_ops_common_utils.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/type_parser.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/upgrader_mobile.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/upgraders.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/upgraders_entry.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/utils.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/version_map.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/add_if_then_else.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/annotate_warns.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/bailout_graph.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/batch_mm.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/canonicalize.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/canonicalize_graph_fuser_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/check_strict_fusion.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/clear_profiling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/clear_undefinedness.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/common_subexpression_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/concat_opt.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/constant_pooling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/constant_propagation.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/create_autodiff_subgraphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/create_functional_graphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dbr_quantization/remove_redundant_aliases.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dead_code_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/decompose_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/device_type_analysis.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dtype_analysis.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/eliminate_no_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/erase_number_types.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fixup_trace_scope_blocks.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fold_conv_bn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fold_linear_bn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/freeze_module.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_concat_linear.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_conv_add_relu_fusion.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_conv_folding.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_graph_optimizations.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_linear_folding.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_linear_transpose.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_ops_to_mkldnn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fuse_linear.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fuse_relu.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/graph_fuser.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/graph_rewrite_helper.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/guard_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/hoist_conv_packed_params.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_autodiff_subgraphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_fork_wait.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_forked_closures.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inliner.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inplace_check.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/insert_guards.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/integer_value_refinement.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lift_closures.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/liveness.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/loop_unrolling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_grad_of.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_tuples.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/metal_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/mkldnn_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/normalize_ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/pass_manager.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_alias_sensitive.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_dict_idioms.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_list_idioms.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_non_tensor.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/prepack_folding.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/dedup_module_uses.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/finalize.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/fusion_passes.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/helper.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/insert_observers.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/insert_quant_dequant.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/quantization_type.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/register_packed_params.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/refine_tuple_types.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_dropout.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_exceptions.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_expands.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_mutation.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_redundant_profiles.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/replacement_of_old_operators.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/requires_grad_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/restore_mutation.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/shape_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/specialize_autogradzero.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/subgraph_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_cache.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_runtime_fusion.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/tensorexpr_fuser.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/update_differentiable_graph_requires_grad.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/memory_dag.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/op_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/optimization_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/subgraph_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/value_refinement_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/variadic_ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/vulkan_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/xnnpack_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/python/update_graph_executor_opt.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/python/utf8_decoding_ignore.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/argument_spec.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/autodiff.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/decomposition_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/decomposition_registry_util.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/graph_executor.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/instruction.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter/frame.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter/preprocess_graph.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/jit_exception.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/jit_trace.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/logging.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/operator.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/print_handler.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/profiling_record.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_ops_utils.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/script_profile.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/serialized_shape_function_registry.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/simple_graph_executor_impl.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/slice_indices_adjust.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/fusion.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/generated_ops.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/impl.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/memory_planner.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/native_ops.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/ops.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/passes.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/te_wrapper.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_script.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_shape_registry.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_shape_registry_util.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/vararg_functions.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/callstack_debug_info_serialization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_export_helpers.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_read.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_source.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickle.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickler.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/python_print.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/source_range_serialization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/type_name_uniquer.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/unpickler.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/block_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/bounds_inference.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/bounds_overlap.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/cpp_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/expr.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_core.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_registry.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/graph_opt.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/hash_provider.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/intrinsic_symbols.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_cloner.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_mutator.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_printer.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_simplifier.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_verifier.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_visitor.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/kernel.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/llvm_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/llvm_jit.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/loopnest.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/loopnest_randomization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/lowerings.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/mem_dependency_checker.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/conv2d.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/matmul.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/misc.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/norm.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/pointwise.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/quantization.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/reduction.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/softmax.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/reduction.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/registerizer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/tensor.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/types.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/unique_name_manager.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/testing/file_check.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/testing/hooks_for_testing.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/backend_device.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/backend_interface.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/lowering_context.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/config.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/debug_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/hash.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/helpers.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_dump_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_metadata.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/lazy_graph_executor.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/metrics.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/multi_wait.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ops/arithmetic_ir_ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ops/utils.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/permutation_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/shape.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/shape_inference.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor_impl.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor_util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/thread_pool.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/trie.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/monitor/counters.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/monitor/events.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/collection.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/combined_traceback.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/data_flow.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/kineto_client_interface.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/kineto_shim.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/python_tracer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/vulkan.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/perf.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/execution_trace_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/itt_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/nvtx_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/stubs/base.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/unwind/unwind.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/unwind/unwind_fb.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/cpp_stacktraces.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/schema_info.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/tensor_flatten.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/variadic.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/cuda/interface.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/autocast.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_graph.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_inplace_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/check_alias_annotation.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_c10_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_prim_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_prim_ops_fulljit.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_special_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/debug_info.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/dynamic_ir.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/config.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ops/device_data.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ops/generic.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_autograd_functions.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_backend_impl.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_eager_fallback.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_lowering_context.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_native_functions.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_node.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_node_lowering.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/import_data.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/export_data.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/optim/sgd.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/random.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/sequential.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/flatbuffer_serializer.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/FunctionsManual.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/out_types.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/TraceTypeManual.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/VariableTypeManual.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/jit.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/backport.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/backport_manager.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/onnx.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export_bytecode.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export_module.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/cpu/fused_kernel.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/module_save.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/byte_order.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Backend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/FileStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Functional.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/GlooDeviceFactory.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/GroupRegistry.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Ops.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ParamCommsUtils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/PrefixStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroup.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupWrapper.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Store.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStoreBackend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStoreLibUvBackend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Utils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/comm.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/debug.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/default_comm_hooks.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/logger.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/logging.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/quantization/quantization.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/reducer.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/sequence_num.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/socket.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Work.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/autograd.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/context/container.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/context/context.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/engine/dist_engine.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/functions/recvrpc_backward.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/functions/sendrpc_backward.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/autograd_metadata.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/propagate_gradients_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/propagate_gradients_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/cleanup_autograd_context_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/cleanup_autograd_context_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_autograd.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_profiling_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_profiling_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rref_backward_req.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rref_backward_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/HashStore.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupRoundRobin.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/agent_utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/message.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/profiler/remote_profiler_manager.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/profiler/server_process_global_profiler.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_remote_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/request_callback.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/request_callback_no_python.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rpc_agent.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_context.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_impl.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_proto.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_remote_call.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_resp.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/tensorpipe_agent.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/tensorpipe_utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/testing/faulty_tensorpipe_agent.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/torchscript_functions.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/types.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/utils.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/cuda.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/datasets/mnist.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/distributed.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/random.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/sequential.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/stream.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/enum.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/imethod.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/mps.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/init.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/module.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/_functions.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/activation.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/adaptive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/batchnorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/normalization.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/instancenorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/conv.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/dropout.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/distance.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/embedding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/fold.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/linear.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/loss.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/padding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/pixelshuffle.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/pooling.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/rnn.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/upsampling.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/transformer.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/container/functional.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/activation.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/adaptive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/batchnorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/embedding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/instancenorm.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/normalization.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/conv.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/dropout.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/linear.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/padding.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/pooling.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/rnn.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/vision.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/transformer.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adagrad.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adam.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adamw.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/lbfgs.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/optimizer.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/rmsprop.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/serialize.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/sgd.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/lr_scheduler.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/step_lr.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/reduce_on_plateau_scheduler.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize/input-archive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize/output-archive.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/xpu.cpp.o [ 89%] Linking CXX shared library ../lib/libtorch_cpu.so Warning: Unused direct dependencies: libc10.so.2.4 /lib64/libqnnpack.so.1 /lib64/libgloo_cuda.so.1 /lib64/liblmdb.so.0.0.0 /lib64/libleveldb.so.1 /lib64/libsnappy.so.1 /lib64/libzmq.so.5 /lib64/libhiredis.so.1.0.0 /lib64/libopencv_highgui.so.409 /lib64/libopencv_optflow.so.409 /lib64/libopencv_videoio.so.409 /lib64/libonnx_optimizer.so /lib64/libfoxi_loader.so.1 /lib64/libsleef.so.3 /lib64/libopencv_ximgproc.so.409 /lib64/libopencv_imgcodecs.so.409 /lib64/libopencv_video.so.409 /lib64/libopencv_dnn.so.409 /lib64/libopencv_calib3d.so.409 /lib64/libopencv_features2d.so.409 /lib64/libopencv_imgproc.so.409 /lib64/libopencv_flann.so.409 /lib64/libopencv_core.so.409 /lib64/libopencv_cudev.so.409 /usr/local/cuda-12.3/lib64/libcudart.so.12 [ 89%] Built target torch_cpu [ 89%] Building CXX object caffe2/torch/lib/libshm/CMakeFiles/shm.dir/core.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAGeneratorImpl.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAContext.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAGraph.cpp.o [ 89%] Linking CXX shared library ../../../../lib/libshm.so Warning: Unused direct dependencies: libtorch_cpu.so.2.4 /lib64/libprotobuf.so.32 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libqnnpack.so.1 /lib64/libgloo.so.1 /lib64/libgloo_cuda.so.1 /lib64/libm.so.6 [ 89%] Built target shm [ 89%] Building CXX object caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o [ 90%] Linking CXX executable ../../../../bin/torch_shm_manager [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDASparseDescriptors.cpp.o Warning: Unused direct dependencies: libshm.so.2.4 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libm.so.6 [ 90%] Built target torch_shm_manager [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CachingHostAllocator.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CuSparseHandlePool.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/EmptyTensor.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/Exceptions.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/PeerToPeerAccess.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/PinnedMemoryAllocator.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/CUDAHooks.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/LazyNVRTC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/llvm_basic.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/llvm_complex.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Resize.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SpectralOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorCompare.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/BatchNorm.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/ConvPlaceholders.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/ConvShared.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/Conv_v7.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/Conv_v8.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/GridSampler.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/LossCTC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/MHA.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/RNN.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/BatchNorm_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/Conv_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/RNN_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/Activation.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Conv.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/ConvPrepack.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/ConvUnpackImpl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Linear.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/LinearPrepack.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/LinearUnpackImpl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Pooling.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/AutocastRNN.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Descriptors.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Handle.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Types.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/nccl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/reducer_cuda.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/NCCLUtils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/UCCTracing.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/intra_node_comm.cpp.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/intra_node_comm.cu.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/rpc/tensorpipe_cuda.cpp.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorFactories.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/Sleep.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub-RadixSortKeys.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub-RadixSortPairs.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/IndexUtils.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/jiterator.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AbsKernel.cu.o [ 91%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationEluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationGeluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationGluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardshrinkKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardsigmoidKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardswishKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardtanhKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationLeakyReluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationLogSigmoidKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationMishKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationPreluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSiluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSoftplusKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSoftshrinkKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationThresholdKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveMaxPooling3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AmpKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AveragePool2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AveragePool3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryBitwiseOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivFloorKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivTrueKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivTruncKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryGeometricKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryLogicalOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMiscOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMulKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryRemainderKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryShiftOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Bucketization.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CUDAScalar.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Col2Im.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CompareEQKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CompareKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ComplexKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ConvolutionMM2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Copy.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CopysignKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CrossKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumminmaxKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumprodKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumsumKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DepthwiseConv2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DepthwiseConv3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistanceKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionBernoulli.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionCauchyKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionExponentialKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionGeometricKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionLogNormalKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionNormal.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionRandomKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionUniform.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Distributions.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Dropout.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Embedding.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/EmbeddingBag.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FillKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FlattenIndicesKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalarTensor.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachReduceOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachTernaryOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachUnaryOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FractionalMaxPool3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FunctionOfAMatrixUtilsKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedAdamKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedAdamWKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedSgdKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GcdLcmKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GridSampler.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IGammaKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Im2Col.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IndexKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Indexing.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Lerp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LinearAlgebra.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogAddExpKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Loss.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LossCTC.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MaxUnpooling.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MixedDtypesLinear.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultiLabelMarginCriterion.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultiMarginLoss.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultinomialKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NLLLoss2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveDilatedConvolution.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Nonzero.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Normalization.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/PointwiseOpsKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/PowKernel.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 389 instantiation of "void at::native::batch_norm_backward_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, __nv_bool, stat_accscalar_t) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 654 instantiation of "std::tuple at::native::batch_norm_backward_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, double, std::array<__nv_bool, 3UL>) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 585 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=double, stat_scalar_t=double, stat_accscalar_t=double, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=double, stat_scalar_t=double, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=float, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=float, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::Half, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::Half, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int32_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int32_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cuh(122): warning #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function scalar_t shared[32]; ^ detected during: instantiation of "scalar_t at::native::reduce(Op, PTA, int) [with scalar_t=at::native::Float2, Op=at::native::GradOp>, PTA=at::GenericPackedTensorAccessor]" at line 489 instantiation of "void at::native::batch_norm_backward_reduce_kernel(at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor, at::GenericPackedTensorAccessor) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, stat_accscalar_t=float, index_t=int64_t]" at line 831 instantiation of "std::tuple at::native::batch_norm_backward_reduce_cuda_template(const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, const at::Tensor &, __nv_bool, __nv_bool, __nv_bool) [with input_scalar_t=c10::BFloat16, stat_scalar_t=float, index_t=int64_t]" at line 739 of /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/Normalization.cu [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RNN.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Randperm.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RangeFactories.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RecordStream.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Reduce.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceAMinMaxKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceArgMaxKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceArgMinKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceLogicKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMaxValuesKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMinValuesKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMomentKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceNormKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceSumProdKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReflectionPad.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RenormKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Repeat.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReplicationPadding.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RreluWithNoise.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ScatterGatherKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SegmentReduce.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Shape.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SoftMax.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sort.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SortImpl.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SortStable.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu: In instantiation of ‘at::Tensor at::native::_GLOBAL__N__08542f1a_10_SoftMax_cu_9f978f63::host_softmax(const at::Tensor&, int64_t, bool, const at::Tensor&) [with Epilogue = LogSoftMaxForwardEpilogue; bool is_log_softmax = true; int64_t = long int]’: /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:1072:56: required from here /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844:2132: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu: In instantiation of ‘at::Tensor at::native::_GLOBAL__N__08542f1a_10_SoftMax_cu_9f978f63::host_softmax(const at::Tensor&, int64_t, bool, const at::Tensor&) [with Epilogue = SoftMaxForwardEpilogue; bool is_log_softmax = false; int64_t = long int]’: /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:1096:54: required from here /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844:2132: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sorting.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SparseBinaryOpIntersectionKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SparseMM.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SpectralOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/StepKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SummaryOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorCompare.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorModeKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorShape.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTopK.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTransformations.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TriangularOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryComplexKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryFractionKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGammaKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAcosKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAcoshKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAsinKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAsinhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAtanKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAtanhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricCosKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricCoshKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricSinKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricSinhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricTanKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricTanhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryLogKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryOpsKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnarySignKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UniqueCub.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleLinear1d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest1d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest3d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleTrilinear3d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ValidateCompressedIndicesKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/WeightNorm.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ZetaKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/airy_ai.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_j0.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_j1.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_y0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_y1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_t.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_u.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_v.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_w.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adam_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adamw_amsgrad_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adamw_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/group_norm_kernel.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/hermite_polynomial_h.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/hermite_polynomial_he.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/int4mm.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/laguerre_polynomial_l.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/layer_norm_kernel.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/legendre_polynomial_p.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_i0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_i1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_k0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_k1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/scaled_modified_bessel_k0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/scaled_modified_bessel_k1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_t.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_u.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_v.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_w.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/spherical_bessel_j0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorBinaryOps.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseMatMul.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredLinear.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredOps.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/Activation.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/AffineQuantizer.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/EmbeddingBag.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/FakeQuantizeCore.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/IntReprQuant.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/MakePerTensorQuantizedTensor.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/attention.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/attention_backward.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim160_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim160_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim192_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim192_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim224_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim224_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim256_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim256_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim32_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim32_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim64_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim64_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim128_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim128_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim160_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim160_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim192_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim192_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim224_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim224_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim256_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim256_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim32_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim32_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim64_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim64_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim128_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim128_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim160_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim160_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim192_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim192_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim224_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim224_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim256_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim256_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim32_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim32_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim64_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim64_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k96.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k96.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k128.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k128_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k32.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k32_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k64.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k64_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k65536.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k65536_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k128.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k128_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k32.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k32_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k64.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k64_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k65536.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k65536_dropout.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_bf16_aligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f16_aligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f16_notaligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f32_aligned.cu.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f32_notaligned.cu.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterNestedTensorCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterQuantizedCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterSparseCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterSparseCsrCUDA.cpp.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/UfuncCUDA_add.cu.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDABlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDASparseBlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CublasHandlePool.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/tunable/StreamTimer.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/tunable/Tunable.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Activation.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Blas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Distributions.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Equal.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GridSampler.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IndexKernel.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceOps.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ScanKernels.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sort.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sorting.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorModeKernel.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorShapeCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTopK.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/jit_utils.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlasImpl.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlasLegacy.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/CudaIPCTypes.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/comm.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/memory_snapshot.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner_cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_torch/shim_cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/codegen/fuser/cuda/fused_kernel.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/profiler/stubs/cuda.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/autograd/functions/comm.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/passes/frozen_conv_add_relu_fusion_cuda.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/tensorexpr/cuda_codegen.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/runtime/register_cuda_ops.cpp.o [ 98%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Unique.cu.o [ 98%] Linking CXX shared library ../lib/libtorch_cuda.so Warning: Unused direct dependencies: libc10_cuda.so /lib64/libgloo_cuda.so.1 /usr/local/cuda-12.3/lib64/libcurand.so.10 libc10.so.2.4 /lib64/libgflags.so.2.2 libtorch_cpu.so.2.4 [ 98%] Built target torch_cuda [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLibBlas.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch.dir/__/empty.cpp.o [ 98%] Linking CXX shared library ../lib/libtorch.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 libtorch_cpu.so.2.4 libtorch_cuda.so [ 98%] Built target torch [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/CUDASolver.cpp.o [ 98%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/CusolverDnHandlePool.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_1.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_0.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_2.cpp.o [ 98%] Linking CXX shared library ../lib/libtorch_cuda_linalg.so Warning: Unused direct dependencies: libtorch_cpu.so.2.4 libtorch_cuda.so libc10_cuda.so /usr/local/cuda-12.3/lib64/libnvToolsExt.so.1 /lib64/libprotobuf.so.32 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libqnnpack.so.1 /lib64/libgloo.so.1 /lib64/libgloo_cuda.so.1 [ 98%] Built target torch_cuda_linalg [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_3.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_4.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_variable_methods.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_0.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_1.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_2.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_nn_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_fft_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_linalg_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_nested_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_sparse_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_special_functions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_return_types.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_enum_tag.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/DataLoader.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Device.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Dtype.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/DynamicTypes.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Generator.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Layout.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/MemoryFormat.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/QScheme.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Module.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/PyInterpreter.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/python_dimname.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Size.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Storage.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/StorageMethods.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/StorageSharing.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Stream.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/TypeInfo.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/api/src/python/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/functions/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/profiler_python.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_anomaly_mode.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_saved_variable_hooks.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_cpp_function.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_engine.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_function.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_hook.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_legacy_variable.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_nested_functions_manual.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_torch_functions_manual.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable_indexing.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/python_compiled_autograd.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cache_entry.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cpp_shim.cpp.o [ 99%] Building C object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cpython_defs.c.o [ 99%] Building C object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/eval_frame.c.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/extra_state.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/guards.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/functorch/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/mps/Module.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/inductor/aoti_runner/pybind.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/backends/backend_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/cast_all_constant_to_floating.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/deduplicate_initializers.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/eval_peephole.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/constant_fold.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/constant_map.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/eliminate_unused_items.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/fixup_onnx_controlflow.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/list_model_parameters.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/function_substitution.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/helper.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/peephole.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/preprocess_for_onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/prepare_division_for_onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/scalar_type_analysis.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/unpack_quantized_weights.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/shape_type_inference.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/function_extraction.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/onnx_log.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/naming.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/pybind_utils.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/autograd_function_process.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/common.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/pattern_encapsulation.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/pattern_conversion.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_arg_flatten.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_custom_class.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_dict.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_interpreter.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_ir.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_list.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/script_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tracer.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/frontend/concrete_module_type.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/frontend/tree_views.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_sugared_value.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tree_views.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/runtime/static/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/tensorexpr/tensorexpr_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/monitor/python_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/multiprocessing/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/onnx/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/profiler/python/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/profiler/python/combined_traceback.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/serialization.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/tensor/python_tensor.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/throughput_benchmark.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/device_lazy_init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/invalid_arguments.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/nested.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/object_ptr.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_arg_parser.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_dispatch.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_symnode.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/pybind.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/pyobject_preservation.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/structseq.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_apply.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_dtypes.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_layouts.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_memoryformats.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_qschemes.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_list.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_new.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_numpy.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_types.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/disable_torch_function.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/verbose.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cpu/Module.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/lazy/python/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/lazy/python/python_util.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Event.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Module.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/python_comm.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Stream.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Graph.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/cudart.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/nvtx.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/utils.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/CUDAPluggableAllocator.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/cudnn.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/c10d/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/c10d/python_comm_hook.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/autograd/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/py_rref.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/python_functions.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/python_rpc_handler.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/request_callback_impl.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/testing/init.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/unpickled_python_call.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/unpickled_python_remote_call.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/runtime/register_distributed_ops.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/python_nccl.cpp.o [100%] Linking CXX shared library ../../lib/libtorch_python.so Warning: Unused direct dependencies: libshm.so.2.4 libtorch.so.2.4 libtorch_cpu.so.2.4 libtorch_cuda.so libc10_cuda.so libc10.so.2.4 [100%] Built target torch_python [100%] Building C object caffe2/torch/CMakeFiles/_C.dir/csrc/stub.c.o [100%] Building CXX object caffe2/torch/CMakeFiles/nnapi_backend.dir/csrc/jit/backends/nnapi/nnapi_backend_lib.cpp.o [100%] Building CXX object caffe2/torch/CMakeFiles/nnapi_backend.dir/csrc/jit/backends/nnapi/nnapi_backend_preprocess.cpp.o [100%] Building CXX object functorch/CMakeFiles/functorch.dir/csrc/dim/dim.cpp.o [100%] Linking C shared library ../../lib/_C.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 libtorch_python.so.2.4 [100%] Built target _C [100%] Building C object functorch/CMakeFiles/functorch.dir/csrc/dim/dim_opcode.c.o [100%] Building CXX object functorch/CMakeFiles/functorch.dir/csrc/init_dim_only.cpp.o [100%] Linking CXX shared library ../../lib/libnnapi_backend.so Warning: Unused direct dependencies: libtorch.so.2.4 libtorch_python.so.2.4 libtorch_cpu.so.2.4 libtorch_cuda.so libc10.so.2.4 [100%] Built target nnapi_backend [100%] Linking CXX shared module functorch.so [100%] Built target functorch + popd ~/build/BUILD/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.2HHbUr + umask 022 + cd /builddir/build/BUILD + '[' /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 '!=' / ']' + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 ++ dirname /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 + mkdir -p /builddir/build/BUILDROOT + mkdir /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS ~/build/BUILD/pytorch/build ~/build/BUILD/pytorch + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -Wl,-lstdc++' + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + cd pytorch + pushd build + export PYTHON_EXECUTABLE=/usr/bin/python3 + PYTHON_EXECUTABLE=/usr/bin/python3 + make install DESTDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 [ 0%] Built target clog [ 0%] Built target fp16 [ 1%] Built target pytorch_qnnpack [ 1%] Built target fxdiv [ 1%] Built target psimd [ 63%] Built target microkernels-all [ 67%] Built target microkernels-prod [ 67%] Built target logging [ 67%] Built target hardware-config [ 67%] Built target indirection [ 68%] Built target jit [ 68%] Built target microparams-init [ 68%] Built target normalization [ 68%] Built target packing [ 68%] Built target allocator [ 68%] Built target memory [ 68%] Built target cache [ 68%] Built target microkernel-utils [ 68%] Built target mutex [ 68%] Built target post-operation [ 68%] Built target operator-utils [ 69%] Built target operators [ 69%] Built target operator-run [ 70%] Built target subgraph [ 70%] Built target convolution-test-helpers [ 70%] Built target XNNPACK [ 70%] Built target fmt [ 72%] Built target c10 [ 72%] Built target c10_cuda [ 72%] Built target Caffe2_PROTO [ 72%] Built target caffe2_protos [ 72%] Built target caffe2_nvrtc [ 72%] Built target ATEN_CPU_FILES_GEN_TARGET [ 89%] Built target torch_cpu [ 89%] Built target ATEN_CUDA_FILES_GEN_TARGET [ 97%] Built target torch_cuda [ 97%] Built target torch [ 97%] Built target torch_cuda_linalg [ 97%] Built target torch_global_deps [ 97%] Built target python_copy_files [ 97%] Built target shm [ 97%] Built target generate-torch-sources [ 97%] Built target torch_python_stubs [ 97%] Built target gen_torch_version [ 99%] Built target torch_python [ 99%] Built target _C [ 99%] Built target nnapi_backend [100%] Built target torch_shm_manager [100%] Built target functorch Install the project... -- Install configuration: "Release" + mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64 + find /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/ -name '*.a' -type f -prune -exec rm -rf '{}' + + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/python3.12 + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libc10.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libc10.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libc10.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libc10_cuda.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libcaffe2_nvrtc.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libshm.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libshm.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libshm.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_cpu.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_cpu.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_cpu.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_cuda.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_cuda_linalg.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_global_deps.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_global_deps.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_global_deps.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_python.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_python.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib/libtorch_python.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/ + popd ~/build/BUILD/pytorch + install -D -pm 755 build/lib/libnnapi_backend.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/ + mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/torch/bin + install -D -pm 644 build/lib/_C.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/torch/ + install -D -pm 644 build/functorch/functorch.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/functorch/_C.so + install -D -pm 644 aten/src/THC/THCDeviceUtils.cuh /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/include/THC/ + ln -sf /usr/include /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/torch/include + ln -sf /usr/lib64 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/torch/lib + ln -sf /usr/bin/torch_shm_manager /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager ++ find ./torch/ -name '*.py' + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/streams.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/streams.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/_gpu_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/_gpu_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/xpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/weak.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/weak.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/viz/_cycles.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/viz/_cycles.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/viz/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/viz/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/throughput_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/throughput_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/writer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/writer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/summary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/summary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_pytorch_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_pytorch_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_proto_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_proto_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_onnx_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_onnx_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_embedding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_embedding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_convert_np.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_convert_np.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_caffe2_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_caffe2_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/show_pickle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/show_pickle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_zoo.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/model_zoo.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_dump/__main__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/model_dump/__main__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_dump/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/model_dump/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/mobile_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/mobile_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/mkldnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/mkldnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/jit/log_extract.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/jit/log_extract.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/hipify_python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/hipify_python.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/cuda_to_hip_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/cuda_to_hip_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/flop_counter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/flop_counter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/file_baton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/file_baton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/dlpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/dlpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/deterministic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/deterministic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/sampler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/sampler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/graph_settings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/graph_settings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/dataset.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/dataset.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/snapshot.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/snapshot.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/decoder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/decoder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/grouping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/grouping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/combining.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/combining.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/combinatorics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/combinatorics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/callable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/callable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/streamreader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/streamreader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/sharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/sharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/selecting.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/selecting.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/routeddecoder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/routeddecoder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/grouping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/grouping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/fileopener.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/fileopener.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/filelister.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/filelister.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/combining.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/combining.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/combinatorics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/combinatorics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/callable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/callable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/gen_pyi.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/gen_pyi.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/datapipe.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/datapipe.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/structures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/structures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/datapipes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/datapipes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/dataframes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/dataframes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/dataframe_wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/dataframe_wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_typing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_typing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_hook_iterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_hook_iterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/dataloader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/dataloader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/backward_compatibility.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/backward_compatibility.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/worker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/worker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/signal_handling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/signal_handling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/pin_memory.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/pin_memory.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/fetch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/fetch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/collate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/collate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/cpp_extension.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/cpp_extension.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/cpp_backtrace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/cpp_backtrace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/collect_env.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/collect_env.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/checkpoint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/checkpoint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bundled_inputs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/bundled_inputs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bottleneck/__main__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/bottleneck/__main__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bottleneck/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/bottleneck/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/valgrind_wrapper/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/valgrind_wrapper/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/sparse_fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/sparse_fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/cpp_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/cpp_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/compile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/compile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/_stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/_stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/spectral.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/spectral.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/sparse_unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/sparse_unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/sparse_binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/sparse_binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/spectral_ops_fuzz_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/spectral_ops_fuzz_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/op_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/op_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/simple_timeit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/simple_timeit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/op_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/op_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/blas_compare_setup.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/blas_compare_setup.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/backend_registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/backend_registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/backcompat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/backcompat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_zip.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_zip.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_typing_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_typing_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_triton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_triton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_traceback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_traceback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/value_ranges.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/value_ranges.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/solve.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/solve.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/singleton_int.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/singleton_int.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/interp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/interp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_stats.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_stats.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_python_dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_python_dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_mode_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_mode_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_import_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_import_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_freeze.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_freeze.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_foreach_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_foreach_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_exposed_in.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_exposed_in.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_device.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_device.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_cxx_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_cxx_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_cpp_extension_versioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_cpp_extension_versioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_contextlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_contextlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_content_store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_content_store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_config_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/_config_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/torch_version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/torch_version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/two_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/two_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/triton_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/triton_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/torchbind_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/torchbind_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/no_future_div.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/no_future_div.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/future_div.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/future_div.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/static_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/static_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/quantization_torch_package_models.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/quantization_torch_package_models.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/make_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/make_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/generate_tests.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/generate_tests.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/fake_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/fake_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/autograd_registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/autograd_registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/aot_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/aot_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/refs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/refs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/special.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/special.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/signal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/signal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/linalg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/linalg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/_masked.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/_masked.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/logging_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/logging_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/logging_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/logging_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/jit_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/jit_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/jit_metaprogramming_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/jit_metaprogramming_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/inductor_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/inductor_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/hypothesis_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/hypothesis_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/hop_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/hop_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/generated/annotated_fn_args.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/generated/annotated_fn_args.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/generated/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/generated/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/dynamo_test_failures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/dynamo_test_failures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/rpc_test_faulty.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/rpc_test_faulty.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/faulty_rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/faulty_rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/faulty_agent_rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/faulty_agent_rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/reinforcement_learning_rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/reinforcement_learning_rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/parameter_server_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/parameter_server_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/dist_optimizer_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/dist_optimizer_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/pipeline/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/pipeline/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/pipe_with_ddp_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/pipe_with_ddp_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/api/remote_module_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/api/remote_module_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/api/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/multi_threaded_pg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/multi_threaded_pg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/fake_pg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/fake_pg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/distributed_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/distributed_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/distributed_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/distributed_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/ddp_under_dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/ddp_under_dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/common_state_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/common_state_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/checkpoint_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/checkpoint_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_tensor/common_dtensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_tensor/common_dtensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/test_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/test_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_ops_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_ops_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/dist_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/dist_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/network2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/network2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/network1.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/network1.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/custom_op_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/custom_op_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/composite_compliance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/composite_compliance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_subclass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_subclass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_quantized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_quantized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_pruning.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_pruning.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_optimizers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_optimizers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_nn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_nn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_mkldnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_mkldnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_methods_invocations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_methods_invocations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_fsdp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_fsdp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_dtype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_dtype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_dist_composable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_dist_composable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_device_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_device_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_cuda.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_cuda.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/codegen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/codegen/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/check_kernel_launches.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/check_kernel_launches.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/autograd_function_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/autograd_function_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/autocast_test_lists.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/autocast_test_lists.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_creation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_creation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_comparison.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/_comparison.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/storage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/storage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/special/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/special/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/semi_structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/semi_structured.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_triton_ops_meta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_triton_ops_meta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_triton_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_triton_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_semi_structured_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_semi_structured_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_semi_structured_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/_semi_structured_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/sparse/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/windows/windows.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/signal/windows/windows.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/windows/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/signal/windows/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/signal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/return_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/return_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quasirandom.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quasirandom.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantization_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quantization_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quant_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/quant_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/quantization_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/quantization_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/quantization_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/quantization_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/fusion_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/fusion_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/fuse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/fuse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fuser_method_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fuser_method_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fuse_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fuse_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_quantized_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/_quantized_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_numeric_suite_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/_numeric_suite_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_numeric_suite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/_numeric_suite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/python_tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/python_tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/itt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/itt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_pattern_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/_pattern_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_memory_profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/_memory_profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/profiler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/package_importer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/package_importer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/package_exporter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/package_exporter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/importer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/importer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/glob_group.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/glob_group.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/find_file_dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/find_file_dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/file_structure_representation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/file_structure_representation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/trace_dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/trace_dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/is_from_package.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/is_from_package.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/find_first_use_of_broken_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/find_first_use_of_broken_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/analyze/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_stdlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_stdlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_package_unpickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_package_unpickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_package_pickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_package_pickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_mock.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_mock.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_mangling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_mangling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_importlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_importlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_directory_reader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_directory_reader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_digraph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/_digraph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/package/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/swa_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/swa_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/sparse_adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/sparse_adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/sgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/sgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/rprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/rprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/rmsprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/rmsprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/radam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/radam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/nadam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/nadam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/lr_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/lr_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/lbfgs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/lbfgs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/asgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/asgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adamw.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adamw.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adamax.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adamax.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adagrad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adagrad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adadelta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/adadelta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/_multi_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/_multi_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/_functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/_functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/verification.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/verification.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset9.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset9.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset8.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset8.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset7.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset7.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset20.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset20.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset19.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset19.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset18.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset18.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset17.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset17.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset16.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset16.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset15.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset15.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset14.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset14.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset13.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset13.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset12.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset12.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset11.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset11.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset10.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset10.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_helper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_helper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_caffe2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_caffe2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/operators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/operators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/errors.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/errors.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_type_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_type_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_onnx_supported_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_onnx_supported_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/onnxruntime.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/onnxruntime.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/onnx_proto_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/onnx_proto_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/jit_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/jit_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/io_adapter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/io_adapter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/type_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/type_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/torch_export_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/torch_export_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/patcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/patcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/virtualization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/virtualization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/type_promotion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/type_promotion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/readability.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/readability.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/modularization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/modularization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/functionalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/functionalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/decomp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/decomp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/op_validation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/op_validation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/onnxfunction_dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/onnxfunction_dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/fx_symbolic_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/fx_symbolic_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/fx_onnx_interpreter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/fx_onnx_interpreter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/dynamo_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/dynamo_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/diagnostics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/diagnostics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/decomposition_table.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/decomposition_table.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/decomposition_skip.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/decomposition_skip.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/analysis/unsupported_nodes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/analysis/unsupported_nodes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/analysis/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/analysis/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/exporter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/exporter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_web_response.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_web_response.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_web_request.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_web_request.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_version_control_details.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_version_control_details.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_translation_metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_translation_metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_suppression.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_suppression.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_stack_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_stack_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_stack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_stack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_special_locations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_special_locations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_sarif_log.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_sarif_log.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_run_automation_details.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_run_automation_details.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_run.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_run.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_result_provenance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_result_provenance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_result.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_result.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_relationship.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_relationship.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_configuration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_configuration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_replacement.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_replacement.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_region.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_region.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_rectangle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_rectangle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_property_bag.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_property_bag.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_physical_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_physical_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_notification.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_notification.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_multiformat_message_string.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_multiformat_message_string.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_message.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_message.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_logical_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_logical_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_location_relationship.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_location_relationship.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_invocation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_invocation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_graph_traversal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_graph_traversal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_fix.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_fix.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_references.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_references.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_properties.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_properties.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_exception.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_exception.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_edge_traversal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_edge_traversal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_edge.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_edge.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_conversion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_conversion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_configuration_override.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_configuration_override.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_code_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_code_flow.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_attachment.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_attachment.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_content.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_content.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_change.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_change.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_address.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_address.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/formatter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/formatter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/_infra.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/_infra.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/_diagnostic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/_diagnostic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/_beartype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/_beartype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_globals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_globals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_exporter_states.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_exporter_states.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_experimental.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_experimental.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_deprecation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_deprecation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/_constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/onnx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/weight_norm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/weight_norm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/stateless.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/stateless.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/spectral_norm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/spectral_norm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/prune.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/prune.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/parametrize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/parametrize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/parametrizations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/parametrizations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/memory_format.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/memory_format.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/convert_parameters.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/convert_parameters.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/clip_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/clip_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_per_sample_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_per_sample_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_named_member_accessor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_named_member_accessor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/linear_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/linear_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/layer_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/layer_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/group_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/group_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/expanded_weights_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/expanded_weights_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/expanded_weights_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/expanded_weights_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/embedding_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/embedding_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/conv_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/conv_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/conv_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/conv_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_deprecation_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_deprecation_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/functional_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/functional_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parameter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parameter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/scatter_gather.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/scatter_gather.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/replicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/replicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/parallel_apply.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/parallel_apply.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/comm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/comm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/upsampling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/upsampling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/transformer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/transformer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/pooling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/pooling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/pixelshuffle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/pixelshuffle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/padding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/padding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/loss.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/loss.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/lazy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/instancenorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/instancenorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/fold.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/fold.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/flatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/flatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/distance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/distance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/container.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/container.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/channelshuffle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/channelshuffle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/adaptive.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/adaptive.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/conv_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/conv_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/bn_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/bn_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/linear_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/linear_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/conv_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/conv_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/modules/fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/modules/fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/cpp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/common_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/common_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/backends/thnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/backends/thnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/bias.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/bias.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/_templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/_templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/attention/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/_reduction.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/_reduction.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/sdpa.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/sdpa.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/nested_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/nested_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/nested/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/spawn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/spawn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/reductions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/reductions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/queue.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/queue.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/pool.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/pool.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/_atfork.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/_atfork.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/mps/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/event.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/mps/event.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/mps/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/monitor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/monitor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/reductions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/reductions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/passthrough.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/passthrough.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/creation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/creation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/_ops_refs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/_ops_refs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/masked/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/linalg/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/linalg/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/library.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/library.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/unsupported_tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/unsupported_tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/supported_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/supported_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/quantized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/quantized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/mobile/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/mobile/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/generate_bytecode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/generate_bytecode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/frontend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/frontend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/annotations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/annotations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_shape_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_shape_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_script.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_script.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_recursive.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_recursive.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_pickle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_pickle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_passes/_property_propagation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_passes/_property_propagation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_monkeytype_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_monkeytype_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_ir_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_ir_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_fuser.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_fuser.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_freeze.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_freeze.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_decompositions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_decomposition_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_decomposition_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_dataclass_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_dataclass_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_check.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_check.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_builtins.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_builtins.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_await.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_await.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_async.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/_async.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/hub.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/hub.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/traceback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/traceback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/tensor_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/tensor_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/subgraph_rewriter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/subgraph_rewriter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/proxy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/proxy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/source_matcher_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/source_matcher_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/matcher_with_name_node_map_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/matcher_with_name_node_map_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/matcher_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/matcher_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/fuser_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/fuser_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tools_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tools_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tests/test_pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tests/test_pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tests/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tests/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/splitter_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/splitter_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/split_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/split_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/split_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/split_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/shape_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/shape_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/reinplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/reinplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/param_fetch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/param_fetch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/operator_support.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/operator_support.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/net_min_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/net_min_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/pass_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/pass_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/partitioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/partitioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/graph_manipulation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/graph_manipulation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/graph_drawer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/graph_drawer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/fake_tensor_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/fake_tensor_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/common/cse_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/common/cse_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/common/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/common/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/backends/cudagraphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/backends/cudagraphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/annotate_getitem_nodes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/annotate_getitem_nodes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/operator_schemas.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/operator_schemas.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/interpreter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/interpreter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/immutable_collections.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/immutable_collections.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/validator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/validator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unify_refinements.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unify_refinements.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/unification_tools.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/unification_tools.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/variadic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/variadic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/conflict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/conflict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/more.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/more.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/match.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/match.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/symbolic_shapes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/symbolic_shapes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/sym_node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/sym_node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/shape_inference/infer_symbol_values.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/shape_inference/infer_symbol_values.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/shape_inference/infer_shape.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/shape_inference/infer_shape.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/schema_type_annotation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/schema_type_annotation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/rewriter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/rewriter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/refinement_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/refinement_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/recording.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/recording.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/proxy_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/proxy_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/partitioner_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/partitioner_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/optimization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/optimization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/normalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/normalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/z3_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/z3_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/transform_to_z3.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/transform_to_z3.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/operation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/operation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint_generator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint_generator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/meta_tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/meta_tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/merge_matmul.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/merge_matmul.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/graph_gradual_typechecker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/graph_gradual_typechecker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/const_fold.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/const_fold.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/accelerator_partitioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/accelerator_partitioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_sym_dispatch_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_sym_dispatch_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_backward_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_backward_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/annotate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/annotate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_symbolic_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_symbolic_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_lazy_graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_lazy_graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_compatibility.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/_compatibility.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/futures/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/futures/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/func/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/func/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fft/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/fft/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/unflatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/unflatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/graph_signature.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/graph_signature.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/exported_program.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/exported_program.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/dynamic_shapes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/dynamic_shapes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/custom_obj.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/custom_obj.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_unlift.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_unlift.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_tree_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_tree_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_safeguard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_safeguard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_remove_effect_tokens_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_remove_effect_tokens_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_remove_auto_functionalized_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/_remove_auto_functionalized_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/export/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/wishart.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/wishart.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/weibull.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/weibull.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/von_mises.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/von_mises.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/uniform.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/uniform.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/transformed_distribution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/transformed_distribution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/studentT.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/studentT.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/relaxed_categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/relaxed_categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/relaxed_bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/relaxed_bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/poisson.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/poisson.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/pareto.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/pareto.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/one_hot_categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/one_hot_categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/negative_binomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/negative_binomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/multivariate_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/multivariate_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/multinomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/multinomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/mixture_same_family.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/mixture_same_family.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/lowrank_multivariate_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/lowrank_multivariate_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/logistic_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/logistic_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/log_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/log_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/lkj_cholesky.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/lkj_cholesky.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/laplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/laplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/kumaraswamy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/kumaraswamy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/kl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/kl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/inverse_gamma.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/inverse_gamma.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/independent.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/independent.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/half_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/half_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/half_cauchy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/half_cauchy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/gumbel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/gumbel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/geometric.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/geometric.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/gamma.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/gamma.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/fishersnedecor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/fishersnedecor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/exponential.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/exponential.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/exp_family.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/exp_family.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/distribution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/distribution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/dirichlet.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/dirichlet.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/continuous_bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/continuous_bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/constraints.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/constraints.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/constraint_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/constraint_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/chi2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/chi2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/cauchy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/cauchy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/binomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/binomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/beta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/beta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/style.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/style.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/loss.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/loss.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/input_reshard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/input_reshard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/fsdp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/fsdp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/ddp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/ddp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/_data_parallel_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/_data_parallel_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/run.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/run.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/server_process_global_profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/server_process_global_profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/rref_proxy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/rref_proxy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/options.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/options.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/backend_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/backend_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_testing/faulty_agent_backend_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_testing/faulty_agent_backend_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/remote_device.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/remote_device.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/worker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/worker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/stream.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/stream.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/tracker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/tracker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/skippable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/skippable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/portal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/portal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/namespace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/namespace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/layout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/layout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/pipeline.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/pipeline.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/pipe.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/pipe.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/phony.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/phony.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/microbatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/microbatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/dependency.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/dependency.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/copy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/copy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/checkpoint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/checkpoint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/profile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/profile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/blockpartition.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/blockpartition.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/zero_redundancy_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/zero_redundancy_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/post_localSGD_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/post_localSGD_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/named_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/named_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_sgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_sgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_rprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_rprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_rmsprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_rmsprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adamw.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adamw.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adamax.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adamax.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adagrad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adagrad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adadelta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adadelta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/apply_optimizer_in_backward.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/apply_optimizer_in_backward.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/templates/remote_module_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/templates/remote_module_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/templates/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/templates/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/instantiator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/instantiator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/api/remote_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/api/remote_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/api/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launcher/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/launcher/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launcher/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/launcher/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/launch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/sharded_grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/sharded_grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/fully_sharded_data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/fully_sharded_data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_wrap_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_wrap_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_unshard_param_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_unshard_param_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_traversal_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_traversal_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_trace_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_trace_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_state_dict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_state_dict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_shard_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_shard_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_runtime_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_runtime_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_optim_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_optim_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_limiter_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_limiter_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_init_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_init_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_fsdp_extensions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_fsdp_extensions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_flat_param.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_flat_param.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_exec_order_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_exec_order_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_dynamo_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_dynamo_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_debug_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_debug_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_common_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_common_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/examples/memory_tracker_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/examples/memory_tracker_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/log_level.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/log_level.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/elastic_distributed_sampler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/elastic_distributed_sampler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/cycling_iterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/cycling_iterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/local_timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/local_timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/file_based_local_timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/file_based_local_timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_server.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_server.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_rendezvous_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_rendezvous_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/dynamic_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/dynamic_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/tail_log.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/tail_log.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/subprocess_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/subprocess_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/redirects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/redirects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/error_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/error_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/metrics/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/metrics/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/metrics/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/metrics/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/local_elastic_agent.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/local_elastic_agent.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/health_check_server.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/health_check_server.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/distributed_c10d.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/distributed_c10d.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/device_mesh.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/device_mesh.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/collective_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/collective_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/storage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/storage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/stateful.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/stateful.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict_saver.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict_saver.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict_loader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict_loader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/resharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/resharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/planner_helpers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/planner_helpers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/planner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/planner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/format_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/format_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/filesystem.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/filesystem.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/stateful_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/stateful_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/async_checkpointing_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/async_checkpointing_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/default_planner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/default_planner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_traverse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_traverse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_storage_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_storage_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_sharded_tensor_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_sharded_tensor_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_nested_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_nested_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_fsspec_filesystem.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_fsspec_filesystem.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_dedup_tensors.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_dedup_tensors.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_dedup_save_plans.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_dedup_save_plans.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_checkpointer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_checkpointer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/c10d_logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/c10d_logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/benchmarks/benchmark_ddp_rpc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/benchmarks/benchmark_ddp_rpc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/argparse_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/argparse_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/averagers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/averagers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/join.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/join.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/optimizer_overlap_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/optimizer_overlap_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/mixed_precision_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/mixed_precision_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/ddp_zero_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/ddp_zero_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_quantization/quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_quantization/quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_optimizer_overlap/optimizer_overlap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_optimizer_overlap/optimizer_overlap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_optimizer_overlap/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_optimizer_overlap/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_comm_hooks/default_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_comm_hooks/default_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_comm_hooks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_comm_hooks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tools/memory_tracker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tools/memory_tracker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tools/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tools/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/tp_conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/tp_conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/sharding_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/sharding_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/redistribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/redistribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/placement_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/placement_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/view_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/view_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/random_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/random_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/pointwise_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/pointwise_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/matrix_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/matrix_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/math_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/math_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/experimental_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/experimental_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/conv_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/conv_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/common_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/common_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/basic_strategy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/basic_strategy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/op_schema.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/op_schema.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/tp_transform.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/tp_transform.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/visualize_sharding_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/visualize_sharding_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/torchrec_sharding_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/torchrec_sharding_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/convnext_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/convnext_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/checkpoint_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/checkpoint_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/device_mesh.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/device_mesh.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/visualize_sharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/visualize_sharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/op_coverage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/op_coverage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/comm_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/comm_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/_collective_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/_collective_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_state_dict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_state_dict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/partial_lower.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/partial_lower.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/parallel_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/parallel_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/log_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/log_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/iter_graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/iter_graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/graph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/graph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/graph_optimization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/graph_optimization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/gm_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/gm_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/experimental_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/experimental_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/distribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/distribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/comm_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/comm_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/batch_dim_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/batch_dim_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_sharding_spec/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_sharding_spec/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding_bag.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding_bag.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_plan/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_plan/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_plan/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_plan/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/reshard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/reshard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/misc_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/misc_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/binary_cmp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/binary_cmp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_optim/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_optim/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/op_registry_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/op_registry_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/common_op_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/common_op_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_functional_collectives_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_functional_collectives_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_functional_collectives.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_functional_collectives.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/replicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/replicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fully_shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fully_shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/fully_shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/fully_shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_param_group.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_param_group.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_param.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_param.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_collectives.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_collectives.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/contract.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/contract.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/checkpoint_activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/checkpoint_activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/distributed/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/streams.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/streams.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/nvtx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/nvtx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/nccl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/nccl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/memory.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/memory.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/jiterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/jiterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/graphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/graphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/error.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/error.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/comm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/comm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_sanitizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_sanitizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_memory_viz.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_memory_viz.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_gpu_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/_gpu_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/lazy/test_mnist.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/csrc/lazy/test_mnist.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/jit/tensorexpr/scripts/bisect.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/csrc/jit/tensorexpr/scripts/bisect.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/jit/tensorexpr/codegen_external.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/csrc/jit/tensorexpr/codegen_external.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/cpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/contrib/_tensorboard_vis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/contrib/_tensorboard_vis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/contrib/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/contrib/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/compiler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/compiler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xnnpack/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/xnnpack/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xeon/run_cpu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/xeon/run_cpu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xeon/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/xeon/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/opt_einsum/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/opt_einsum/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/openmp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/openmp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/nnpack/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/nnpack/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mps/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mps/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mkldnn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mkldnn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mkl/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mkl/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mha/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/mha/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cudnn/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cudnn/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cudnn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cudnn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/cpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/serializer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/serializer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_coreml/preprocess.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_coreml/preprocess.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_coreml/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/_coreml/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler_legacy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler_legacy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/gradcheck.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/gradcheck.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/grad_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/grad_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/forward_ad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/forward_ad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/anomaly_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/anomaly_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/xnnpack_quantizer_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/xnnpack_quantizer_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/xnnpack_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/xnnpack_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/x86_inductor_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/x86_inductor_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/embedding_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/embedding_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/composable_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/composable_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_pt2e.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_pt2e.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantization_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantization_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quant_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quant_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/qconfig_mapping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/qconfig_mapping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/representation/rewrite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/representation/rewrite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/representation/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/representation/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/qat_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/qat_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/port_metadata_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/port_metadata_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/graph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/graph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/generate_numeric_debug_handle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/generate_numeric_debug_handle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/export_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/export_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/duplicate_dq_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/duplicate_dq_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/quantize_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/quantize_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/qconfig_mapping_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/qconfig_mapping_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lstm_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lstm_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lower_to_qnnpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lower_to_qnnpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lower_to_fbgemm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lower_to_fbgemm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/fuse_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/fuse_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/fuse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/fuse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/custom_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/custom_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report_visualizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report_visualizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report_observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report_observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/detector.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/detector.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_lower_to_native_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_lower_to_native_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_decomposed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_decomposed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fuser_method_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fuser_method_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fuse_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fuse_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/fake_quantize_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/fake_quantize_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/apot_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/apot_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/APoT_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/APoT_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/x86.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/x86.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/tensorrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/tensorrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/qnnpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/qnnpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/onednn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/onednn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/observation_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/observation_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/native.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/native.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/fbgemm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/fbgemm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/executorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/executorch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/backend_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/backend_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/_qnnpack_pt2e.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/_qnnpack_pt2e.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/_common_operator_config_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/_common_operator_config_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_learnable_fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_learnable_fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_correct_bias.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_correct_bias.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/weight_norm_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/weight_norm_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/base_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/base_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/lambda_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/lambda_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/cubic_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/cubic_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/base_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/base_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/saliency_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/saliency_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/prune_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/prune_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/parametrization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/parametrization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/lstm_saliency_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/lstm_saliency_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/base_structured_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/base_structured_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/FPGM_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/FPGM_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/quantization_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/quantization_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/tests/test_callbacks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/tests/test_callbacks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/data_sparsity.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/data_sparsity.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/_data_sparstity_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/_data_sparstity_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/data_norm_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/data_norm_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_model_metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_model_metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_forward_time.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_forward_time.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_disk_savings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_disk_savings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/dlrm_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/dlrm_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_scheduler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_scheduler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/activation_sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/activation_sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/weight_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/weight_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/qconfig_multi_mapping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/qconfig_multi_mapping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/ns_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/ns_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/n_shadows_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/n_shadows_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/graph_passes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/graph_passes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/graph_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/graph_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/_numeric_suite_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/_numeric_suite_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/_numeric_suite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/_numeric_suite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/ns/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/dynamic/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/dynamic/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/functional_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/functional_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/conv_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/conv_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/conv_add.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/conv_add.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/bn_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/bn_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/linear_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/linear_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/conv_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/conv_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/modules/fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/modules/fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/ao/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_weights_only_unpickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_weights_only_unpickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vmap_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_vmap_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/_structures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/_structures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_vendor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_utils_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_utils_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_torch_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_torch_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor_str.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_tensor_str.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_tensor_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/schema_check_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/schema_check_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/meta_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/meta_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/functional_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/functional_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_subclasses/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_streambase.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_streambase.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_storage_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_storage_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_sources.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_sources.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/special/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/special/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/nn/functional/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/nn/functional/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/linalg/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/linalg/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_refs/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_python_dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_python_dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims_common/wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims_common/wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims_common/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims_common/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/rng_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/rng_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/executor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/executor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/debug_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/debug_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_prims/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/testing/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/testing/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/linalg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/linalg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_unary_ufuncs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_unary_ufuncs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_ufuncs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_ufuncs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_reductions_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_reductions_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_normalizations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_normalizations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_ndarray.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_ndarray.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_getlimits.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_getlimits.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_funcs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_funcs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_funcs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_funcs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_dtypes_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_dtypes_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_dtypes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_dtypes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_casting_dicts.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_casting_dicts.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_binary_ufuncs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/_binary_ufuncs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_numpy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_namedtensor_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_namedtensor_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_meta_registrations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_meta_registrations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lowrank.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lowrank.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/structured.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/_registrations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/_registrations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_logging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lobpcg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lobpcg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_linalg_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_linalg_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/simple_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/simple_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/fake_class_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/fake_class_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/custom_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/abstract_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/abstract_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_library/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/ts_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/ts_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/tensor_factory_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/tensor_factory_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/ir_cache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/ir_cache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/extract_compiled_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/extract_compiled_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/device_context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/device_context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/computation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/computation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/closure.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/closure.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_lazy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_jit_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_jit_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/wrapper_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/wrapper_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/virtualized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/virtualized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/triton_heuristics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/triton_heuristics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/triton_helpers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/triton_helpers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/test_operators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/test_operators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/test_case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/test_case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/sizevars.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/sizevars.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/select_algorithm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/select_algorithm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/quantized_lowerings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/quantized_lowerings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/pattern_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/pattern_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/optimize_indexing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/optimize_indexing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/ops_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/ops_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/mkldnn_lowerings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/mkldnn_lowerings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/lowering.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/lowering.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/unpack_mixed_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/unpack_mixed_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm_plus_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm_plus_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/bmm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/bmm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/ir.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/ir.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/inductor_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/inductor_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/index_propagation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/index_propagation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/split_cat.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/split_cat.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/mm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/mm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/bmm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/bmm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/addmm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/addmm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_9.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_9.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_8.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_8.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_7.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_7.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_6.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_6.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_5.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_5.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_4.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_4.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_3.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_3.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_18.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_18.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_17.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_17.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_16.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_16.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_15.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_15.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_14.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_14.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_13.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_13.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_12.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_12.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_11.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_11.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_10.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_10.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_1.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_1.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/replace_random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/replace_random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/reinplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/reinplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/pre_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/pre_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/post_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/post_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/pad_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/pad_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/numeric_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/numeric_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/mkldnn_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/mkldnn_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/misc_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/misc_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/joint_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/joint_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/group_batch_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/group_batch_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/fuse_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/fuse_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/freezing_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/freezing_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/efficient_conv_bn_eval.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/efficient_conv_bn_eval.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/dedupe_symint_uses.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/dedupe_symint_uses.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/decompose_mem_bound_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/decompose_mem_bound_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/ddp_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/ddp_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/binary_folding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/binary_folding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/freezing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/freezing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/exc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/exc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/decomposition.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/decomposition.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/cudagraph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/cudagraph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/cudagraph_trees.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/cudagraph_trees.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/coordinate_descent_tuner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/coordinate_descent_tuner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/constant_folding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/constant_folding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/compile_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/compile_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/comms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/comms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/comm_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/comm_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/xpu/device_op_overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/xpu/device_op_overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/xpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/xpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_split_scan.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_split_scan.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_foreach.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_foreach.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/multi_kernel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/multi_kernel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/memory_planning.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/memory_planning.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda_combined_scheduling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda_combined_scheduling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/gemm_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/gemm_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/device_op_overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/device_op_overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_lib_extensions/gemm_operation_extensions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_lib_extensions/gemm_operation_extensions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_lib_extensions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_lib_extensions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_epilogue_gen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_epilogue_gen.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_kernel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_kernel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_env.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_env.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_cpp_scheduling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_cpp_scheduling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp_wrapper_cuda.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp_wrapper_cuda.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp_wrapper_cpu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp_wrapper_cpu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codecache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/codecache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/bounds.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/bounds.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/autotune_process.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/autotune_process.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_inductor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/while_loop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/while_loop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/triton_kernel_wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/triton_kernel_wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/torchbind.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/torchbind.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/strict_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/strict_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/out_dtype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/out_dtype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/map.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/effects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/effects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/cond.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/cond.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/auto_functionalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/auto_functionalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_guards.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_guards.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/vmap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/vmap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/top_operators_github_usage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/top_operators_github_usage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/pytree_hacks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/pytree_hacks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/python_key.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/python_key.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/pyfunctorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/pyfunctorch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/partitioners.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/partitioners.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/make_functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/make_functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/fx_minifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/fx_minifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/functional_call.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/functional_call.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/eager_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/eager_transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/deprecated.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/deprecated.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/compilers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/compilers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/compile_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/compile_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/benchmark_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/benchmark_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/batch_norm_replacement.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/batch_norm_replacement.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/autograd_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/autograd_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/apis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/apis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/aot_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/aot_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/traced_function_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/traced_function_transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/subclass_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/subclass_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/schemas.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/schemas.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/runtime_wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/runtime_wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/logging_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/logging_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/input_output_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/input_output_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/functional_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/functional_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/collect_metadata_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/collect_metadata_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_functorch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/verifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/verifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/upgrade.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/upgrade.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/union.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/union.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/serialize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/serialize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/schema_check.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/schema_check.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/schema.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/schema.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/serde/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_view_ops_with_view_copy_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_view_ops_with_view_copy_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_sym_size_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_sym_size_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_set_grad_with_hop_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_set_grad_with_hop_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/remove_runtime_assertions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/remove_runtime_assertions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/lift_constants_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/lift_constants_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/functionalize_side_effectful_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/functionalize_side_effectful_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/collect_tracepoints_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/collect_tracepoints_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/add_runtime_assertions_for_constraints_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/add_runtime_assertions_for_constraints_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/proxy_value.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/proxy_value.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/node_metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/node_metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/pass_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/non_strict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/non_strict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/exported_program.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/exported_program.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/error.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/error.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/gen_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/gen_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/user_input_mutation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/user_input_mutation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/type_reflection_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/type_reflection_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/torch_sym_min.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/torch_sym_min.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/tensor_setattr.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/tensor_setattr.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/static_if.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/static_if.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/static_for_loop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/static_for_loop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/specialized_attribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/specialized_attribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/scalar_output.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/scalar_output.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/pytree_flatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/pytree_flatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/optional_input.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/optional_input.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/null_context_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/null_context_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/nested_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/nested_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/model_attr_mutation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/model_attr_mutation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/list_unpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/list_unpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/list_contains.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/list_contains.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/fn_with_kwargs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/fn_with_kwargs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_view.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_view.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_slicing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_slicing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_round.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_round.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_map.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_if_guard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_if_guard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_constructor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_constructor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_assert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_assert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dictionary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dictionary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/constrain_as_value_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/constrain_as_value_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/constrain_as_size_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/constrain_as_size_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_predicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_predicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_operands.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_operands.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_closed_over_variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_closed_over_variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_nonlocal_variables.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_nonlocal_variables.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_nested_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_nested_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_class_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_class_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/class_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/class_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/autograd_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/autograd_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/assume_constant_result.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/assume_constant_result.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/db/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_export/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/user_defined.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/user_defined.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/torch_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/torch_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/torch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/torch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/sdpa.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/sdpa.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/nn_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/nn_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/misc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/misc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/lists.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/lists.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/lazy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/iter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/iter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/higher_order_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/higher_order_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/dicts.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/dicts.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/ctx_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/ctx_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/constant.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/constant.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/builtin.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/builtin.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/builder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/builder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/trace_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/trace_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/testing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/testing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/test_minifier_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/test_minifier_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/test_case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/test_case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/tensor_version_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/tensor_version_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/symbolic_convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/symbolic_convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/source.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/source.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/side_effects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/side_effects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/resume_execution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/resume_execution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/after_dynamo.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/after_dynamo.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/after_aot.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/after_aot.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/replay_record.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/replay_record.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/polyfill.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/polyfill.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/output_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/output_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/mutation_guard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/mutation_guard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/guards.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/guards.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/funcname_cache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/funcname_cache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/external_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/external_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/exc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/exc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/eval_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/eval_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/device_interface.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/device_interface.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/decorators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/decorators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/debug_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/debug_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/current_scope_id.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/current_scope_id.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/create_parameter_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/create_parameter_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/convert_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/convert_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/comptime.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/comptime.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/compiled_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/compiled_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/codegen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/codegen.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/code_context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/code_context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/callback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/callback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/cache_size.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/cache_size.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/bytecode_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/bytecode_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/bytecode_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/bytecode_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/tvm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/tvm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/torchxla.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/torchxla.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/tensorrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/tensorrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/onnxrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/onnxrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/inductor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/inductor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/debugging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/debugging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/cudagraphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/cudagraphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/_trace_wrapped_higher_order_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/_trace_wrapped_higher_order_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dynamo/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dispatch/python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dispatch/python.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dispatch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_dispatch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_deploy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_deploy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions_for_rng.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions_for_rng.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions_for_jvp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions_for_jvp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_decomp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_custom_op/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_compile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_compile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_classes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_classes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_awaits/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_awaits/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_appdirs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_appdirs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__future__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/__future__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__config__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/__config__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_VF.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torch/_VF.py ++ find ./torchgen/ -name '*.py' + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/yaml_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/yaml_utils.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/utils.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/generator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/generator.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/gen_static_runtime_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/gen_static_runtime_ops.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/config.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/shape_functions/gen_jit_shape_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/shape_functions/gen_jit_shape_functions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/selector.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/selector.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/operator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/operator.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/gen_mobile_upgraders_constant.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/gen_mobile_upgraders_constant.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/gen_mobile_upgraders.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/gen_mobile_upgraders.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/native_function_generation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/native_function_generation.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/model.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/model.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/local.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/local.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_vmap_plumbing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_vmap_plumbing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_lazy_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_lazy_tensor.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_functionalization_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_functionalization_type.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_executorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_executorch.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_backend_stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_backend_stubs.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_aoti_c_shim.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen_aoti_c_shim.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/gen.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/fuse/gen_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/fuse/gen_patterns.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/parse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/parse.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/model.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/model.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/unboxing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/unboxing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/types.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/signatures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/signatures.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/et_cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/et_cpp.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/custom_ops.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/executorch/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/ufunc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/ufunc.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/register_dispatch_key.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/register_dispatch_key.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/native_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/native_functions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/lazy_ts_lowering.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/lazy_ts_lowering.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/lazy_ir.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/lazy_ir.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/dest/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/decompositions/gen_jit_decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/decompositions/gen_jit_decompositions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/context.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/code_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/code_template.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/unboxing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/unboxing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/ufunc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/ufunc.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/types_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/types_base.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/types.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/signatures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/signatures.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/types/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/translate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/translate.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/structured.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/python.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/native.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/native.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/meta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/meta.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/lazy.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/functionalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/functionalization.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/dispatcher.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/cpp.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/autograd.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/api/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./torchgen/__init__.py ++ find ./functorch/ -name '*.py' + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/op_analysis/gen_data.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/op_analysis/gen_data.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_per_sample_gradients.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_per_sample_gradients.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_jacobians_and_hessians.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_jacobians_and_hessians.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_ensembling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_ensembling.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/experimental/ops.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/control_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/experimental/control_flow.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/experimental/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang_transforms_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang_transforms_module.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang_transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/support/omniglot_loaders.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/support/omniglot_loaders.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-ptonly.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-ptonly.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-higher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-higher.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/lennard_jones/lennard_jones.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/lennard_jones/lennard_jones.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/ensembling/parallel_train.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/ensembling/parallel_train.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/dp_cifar10/cifar10_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/dp_cifar10/cifar10_transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/dp_cifar10/cifar10_opacus.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/dp_cifar10/cifar10_opacus.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/simple_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/simple_function.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/linear_train.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/linear_train.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/fuse_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/fuse_module.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/eager_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/eager_fusion.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/rearrange.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/einops/rearrange.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/_parsing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/einops/_parsing.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/einops/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/docs/source/conf.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/docs/source/conf.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/wrap_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/wrap_type.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/tree_map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/tree_map.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/reference.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/op_properties.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/op_properties.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/magic_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/magic_trace.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/dim.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/dim.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/delayed_mul_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/delayed_mul_tensor.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/batch_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/batch_tensor.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/dim/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/compile/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/compile/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/process_scorecard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/process_scorecard.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/pointwise_scorecard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/pointwise_scorecard.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/per_sample_grads.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/per_sample_grads.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/operator_authoring.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/operator_authoring.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/cse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/cse.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/chrome_trace_parser.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/chrome_trace_parser.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/vmap/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/vmap/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/make_functional/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/make_functional/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/eager_transforms/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/eager_transforms/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/aot_autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/aot_autograd/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/_src/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/python3.12/site-packages/./functorch/__init__.py ++ /usr/local/cuda/bin/nvcc --version ++ grep release ++ cut -d, -f2 ++ awk '{print $2}' + cuver=12.3 + echo 'from typing import Optional' + echo '__all__ = ['\''__version__'\'', '\''debug'\'', '\''cuda'\'', '\''git_version'\'', '\''hip'\'']' + echo '__version__ = '\''2.4.0'\''' + echo 'debug = False' + echo 'cuda: Optional[str] = '\''12.3'\''' + echo 'git_version = '\''7efaf54dc46034189cb36b345764a5a9a5b693d4'\''' + echo 'hip: Optional[str] = None' + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//builddir/build/BUILD/pytorch/nvfuser/nvfuser.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/ mv: cannot stat '/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//builddir/build/BUILD/pytorch/nvfuser/nvfuser.so': No such file or directory + true + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//builddir/build/BUILD/pytorch/torch/lib/libnvfuser_codegen.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/ mv: cannot stat '/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//builddir/build/BUILD/pytorch/torch/lib/libnvfuser_codegen.so': No such file or directory + true + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/include/fmt + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/include/clog.h + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/include/xnnpack.h + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//builddir/build/BUILD/pytorch/test + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//builddir/build/BUILD/pytorch/nvfuser + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/cmake/fmt + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64//usr/lib64/pkgconfig/fmt.pc + find /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 -name functorch.so -exec rm -f '{}' ';' + /usr/bin/python3 setup.py egg_info Building wheel torch-2.4.0a0+git7efaf54 running egg_info creating torch.egg-info writing torch.egg-info/PKG-INFO writing dependency_links to torch.egg-info/dependency_links.txt writing entry points to torch.egg-info/entry_points.txt writing requirements to torch.egg-info/requires.txt writing top-level names to torch.egg-info/top_level.txt writing manifest file 'torch.egg-info/SOURCES.txt' reading manifest file 'torch.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files matching '*.o' found anywhere in distribution warning: no previously-included files matching '*.so' found anywhere in distribution warning: no previously-included files matching '*.dylib' found anywhere in distribution warning: no previously-included files matching '*.a' found anywhere in distribution warning: no previously-included files matching '*.swp' found anywhere in distribution adding license file 'LICENSE' adding license file 'NOTICE' writing manifest file 'torch.egg-info/SOURCES.txt' + cp -r torch.egg-info /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/python3.12/site-packages/ + sed -i '/^\[/!s/[<=>].*//g' /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/python3.12/site-packages/torch.egg-info/requires.txt + sed -i /triton/d /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/python3.12/site-packages/torch.egg-info/requires.txt + set +x Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/bin/torch_shm_manager Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libc10.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libc10_cuda.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libcaffe2_nvrtc.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libnnapi_backend.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libshm.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libtorch.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libtorch_cpu.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libtorch_cuda.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libtorch_cuda_linalg.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libtorch_global_deps.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/libtorch_python.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/python3.12/site-packages/functorch/_C.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/python3.12/site-packages/torch/_C.so + /usr/lib/rpm/check-buildroot + /usr/lib/rpm/redhat/brp-ldconfig + /usr/lib/rpm/brp-compress + /usr/lib/rpm/brp-strip /usr/bin/strip + /usr/lib/rpm/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump + /usr/lib/rpm/redhat/brp-strip-lto /usr/bin/strip + /usr/lib/rpm/brp-strip-static-archive /usr/bin/strip + /usr/lib/rpm/check-rpaths + /usr/lib/rpm/redhat/brp-mangle-shebangs + /usr/lib/rpm/brp-remove-la-files + env /usr/lib/rpm/redhat/brp-python-bytecompile '' 1 0 -j4 Bytecompiling .py files below /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/lib64/python3.12 using python3.12 + /usr/lib/rpm/redhat/brp-python-hardlink Processing files: pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.u50cEK + umask 022 + cd /builddir/build/BUILD + cd pytorch + DOCDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/share/doc/pytorch + export LC_ALL= + LC_ALL= + export DOCDIR + /usr/bin/mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/share/doc/pytorch + cp -pr /builddir/build/BUILD/pytorch/README.md /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/share/doc/pytorch + cp -pr /builddir/build/BUILD/pytorch/CONTRIBUTING.md /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/share/doc/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Executing(%license): /bin/sh -e /var/tmp/rpm-tmp.XWbuzW + umask 022 + cd /builddir/build/BUILD + cd pytorch + LICENSEDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/share/licenses/pytorch + export LC_ALL= + LC_ALL= + export LICENSEDIR + /usr/bin/mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/share/licenses/pytorch + cp -pr /builddir/build/BUILD/pytorch/LICENSE /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64/usr/share/licenses/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Provides: libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcaffe2_nvrtc.so()(64bit) libnnapi_backend.so()(64bit) libshm.so.2.4()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_cuda_linalg.so()(64bit) libtorch_global_deps.so.2.4()(64bit) pytorch = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc40 pytorch(aarch-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc40 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.17)(64bit) libc.so.6(GLIBC_2.28)(64bit) libc.so.6(GLIBC_2.32)(64bit) libc.so.6(GLIBC_2.33)(64bit) libc.so.6(GLIBC_2.34)(64bit) libc.so.6(GLIBC_2.38)(64bit) libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcpuinfo.so.1()(64bit) libcublas.so.12()(64bit) libcublas.so.12(libcublas.so.12)(64bit) libcublasLt.so.12()(64bit) libcublasLt.so.12(libcublasLt.so.12)(64bit) libcuda.so.1()(64bit) libcudart.so.12()(64bit) libcudart.so.12(libcudart.so.12)(64bit) libcudnn.so.8()(64bit) libcudnn.so.8(libcudnn.so.8)(64bit) libcufft.so.11()(64bit) libcufft.so.11(libcufft.so.11)(64bit) libcurand.so.10()(64bit) libcusolver.so.11()(64bit) libcusolver.so.11(libcusolver.so.11)(64bit) libcusparse.so.12()(64bit) libcusparse.so.12(libcusparse.so.12)(64bit) libfoxi_loader.so.1()(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libgcc_s.so.1(GCC_4.2.0)(64bit) libgcc_s.so.1(GCC_4.5.0)(64bit) libgflags.so.2.2()(64bit) libglog.so.0()(64bit) libgloo.so.1()(64bit) libgloo_cuda.so.1()(64bit) libgomp.so.1()(64bit) libgomp.so.1(GOMP_4.0)(64bit) libgomp.so.1(OMP_1.0)(64bit) libhiredis.so.1.0.0()(64bit) libkineto.so.1()(64bit) libleveldb.so.1()(64bit) liblmdb.so.0.0.0()(64bit) libm.so.6()(64bit) libm.so.6(GLIBC_2.17)(64bit) libm.so.6(GLIBC_2.23)(64bit) libm.so.6(GLIBC_2.27)(64bit) libm.so.6(GLIBC_2.29)(64bit) libm.so.6(GLIBC_2.35)(64bit) libm.so.6(GLIBC_2.38)(64bit) libmagma.so.1()(64bit) libnccl.so.2()(64bit) libnnpack.so.1()(64bit) libnuma.so.1()(64bit) libnuma.so.1(libnuma_1.1)(64bit) libnuma.so.1(libnuma_1.2)(64bit) libnvToolsExt.so.1()(64bit) libnvToolsExt.so.1(libnvToolsExt.so.1)(64bit) libnvrtc.so.12()(64bit) libnvrtc.so.12(libnvrtc.so.12)(64bit) libonnx.so()(64bit) libonnx_optimizer.so()(64bit) libonnx_proto.so()(64bit) libopenblaso.so.0()(64bit) libopencv_calib3d.so.409()(64bit) libopencv_core.so.409()(64bit) libopencv_cudev.so.409()(64bit) libopencv_dnn.so.409()(64bit) libopencv_features2d.so.409()(64bit) libopencv_flann.so.409()(64bit) libopencv_highgui.so.409()(64bit) libopencv_imgcodecs.so.409()(64bit) libopencv_imgproc.so.409()(64bit) libopencv_optflow.so.409()(64bit) libopencv_video.so.409()(64bit) libopencv_videoio.so.409()(64bit) libopencv_ximgproc.so.409()(64bit) libprotobuf.so.32()(64bit) libpthreadpool.so.1()(64bit) libqnnpack.so.1()(64bit) libshm.so.2.4()(64bit) libsleef.so.3()(64bit) libsnappy.so.1()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.11)(64bit) libstdc++.so.6(CXXABI_1.3.13)(64bit) libstdc++.so.6(CXXABI_1.3.15)(64bit) libstdc++.so.6(CXXABI_1.3.2)(64bit) libstdc++.so.6(CXXABI_1.3.3)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(CXXABI_1.3.7)(64bit) libstdc++.so.6(CXXABI_1.3.8)(64bit) libstdc++.so.6(CXXABI_1.3.9)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.17)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.19)(64bit) libstdc++.so.6(GLIBCXX_3.4.20)(64bit) libstdc++.so.6(GLIBCXX_3.4.21)(64bit) libstdc++.so.6(GLIBCXX_3.4.22)(64bit) libstdc++.so.6(GLIBCXX_3.4.26)(64bit) libstdc++.so.6(GLIBCXX_3.4.29)(64bit) libstdc++.so.6(GLIBCXX_3.4.30)(64bit) libstdc++.so.6(GLIBCXX_3.4.32)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtensorpipe.so.1()(64bit) libtensorpipe_cuda.so.1()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_python.so.2.4()(64bit) libzmq.so.5()(64bit) rtld(GNU_HASH) Processing files: pytorch-devel-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 Provides: cmake(ATen) cmake(Caffe2) cmake(Torch) = 2.4.0 cmake(aten) cmake(caffe2) cmake(torch) = 2.4.0 pytorch-devel = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc40 pytorch-devel(aarch-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc40 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: cmake-filesystem libc10.so.2.4()(64bit) libshm.so.2.4()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_global_deps.so.2.4()(64bit) Processing files: pytorch-python3-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager -> /usr/bin/torch_shm_manager warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/include -> /usr/include warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/lib -> /usr/lib64 Provides: libtorch_python.so.2.4()(64bit) python3.12dist(torch) = 2.4.0 python3.12dist(torch) = 2.4~a0 python3dist(torch) = 2.4~a0 pytorch-python3 = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc40 pytorch-python3(aarch-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc40 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.17)(64bit) libc.so.6(GLIBC_2.32)(64bit) libc.so.6(GLIBC_2.34)(64bit) libc.so.6(GLIBC_2.38)(64bit) libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcudart.so.12()(64bit) libcudart.so.12(libcudart.so.12)(64bit) libcudnn.so.8()(64bit) libcudnn.so.8(libcudnn.so.8)(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libgcc_s.so.1(GCC_4.5.0)(64bit) libglog.so.0()(64bit) libnvToolsExt.so.1()(64bit) libnvToolsExt.so.1(libnvToolsExt.so.1)(64bit) libprotobuf.so.32()(64bit) libshm.so.2.4()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.11)(64bit) libstdc++.so.6(CXXABI_1.3.13)(64bit) libstdc++.so.6(CXXABI_1.3.15)(64bit) libstdc++.so.6(CXXABI_1.3.2)(64bit) libstdc++.so.6(CXXABI_1.3.3)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(CXXABI_1.3.8)(64bit) libstdc++.so.6(CXXABI_1.3.9)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.19)(64bit) libstdc++.so.6(GLIBCXX_3.4.20)(64bit) libstdc++.so.6(GLIBCXX_3.4.21)(64bit) libstdc++.so.6(GLIBCXX_3.4.22)(64bit) libstdc++.so.6(GLIBCXX_3.4.26)(64bit) libstdc++.so.6(GLIBCXX_3.4.29)(64bit) libstdc++.so.6(GLIBCXX_3.4.30)(64bit) libstdc++.so.6(GLIBCXX_3.4.32)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_python.so.2.4()(64bit) python(abi) = 3.12 python3.12dist(filelock) python3.12dist(fsspec) python3.12dist(jinja2) python3.12dist(networkx) python3.12dist(sympy) python3.12dist(typing-extensions) >= 4.8 rtld(GNU_HASH) Checking for unpackaged file(s): /usr/lib/rpm/check-files /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 Wrote: /builddir/build/RPMS/pytorch-devel-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64.rpm Wrote: /builddir/build/RPMS/pytorch-python3-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64.rpm Wrote: /builddir/build/RPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.PQJfZV + umask 022 + cd /builddir/build/BUILD + cd pytorch + /usr/bin/rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.aarch64 + RPM_EC=0 ++ jobs -p + exit 0 Executing(rmbuild): /bin/sh -e /var/tmp/rpm-tmp.6tjcyC + umask 022 + cd /builddir/build/BUILD + rm -rf /builddir/build/BUILD/pytorch-SPECPARTS + rm -rf pytorch pytorch.gemspec + RPM_EC=0 ++ jobs -p + exit 0 RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) absolute symlink: /usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager -> /usr/bin/torch_shm_manager absolute symlink: /usr/lib64/python3.12/site-packages/torch/include -> /usr/include absolute symlink: /usr/lib64/python3.12/site-packages/torch/lib -> /usr/lib64 Finish: rpmbuild pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm Finish: build phase for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm INFO: chroot_scan: 1 files copied to /var/lib/copr-rpmbuild/results/chroot_scan INFO: /var/lib/mock/fedora-40-aarch64-1712885791.289313/root/var/log/dnf5.log INFO: Done(/var/lib/copr-rpmbuild/results/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc40.src.rpm) Config(child) 430 minutes 16 seconds INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results INFO: Cleaning up build root ('cleanup_on_success=True') Start: clean chroot INFO: unmounting tmpfs. Finish: clean chroot Finish: run Running RPMResults tool Package info: { "packages": [ { "name": "pytorch-python3", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc40", "arch": "aarch64" }, { "name": "pytorch", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc40", "arch": "src" }, { "name": "pytorch-devel", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc40", "arch": "aarch64" }, { "name": "pytorch", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc40", "arch": "aarch64" } ] } RPMResults finished